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Reading Abilities of Business Executives 


Carol S. Bellows and Carl H. Rush, Jr. 


Personnel Research Center, Wayne University 


Business executives are exposed to large 
quantities of reading materials every year. 
In 1944 there were 577 commercial and fi- 
nancial digests, news letters, and information 
services. These have probably increased in 
the past 6 years. Books published on business 
and economic subjects number about 500 
annually. There is hardly need to mention 
the thousands of press releases, letters, and 
memoranda that find their way to the execu- 
tive’s desk. 

What do business men read? How well do 
they read? After a brief comment in answer 
to the first question, this paper will be con- 
cerned with the second one. Data presented 
are based on the results of courses in reading 


efficiency participated in by 150 executives 
between 1947 and 1950. 


Reading Interests of Executives 


Responses to a questionnaire circulated in 
1949 to more than 20,000 subscribers to the 
Harvard Business Review throw some light on 
what executives ordinarily read (2). Two 
newspapers were read by virtually all of the 
executives responding. Almost half of them 
subscribed to the Wall Street Journal. The 
New York Times was reported read by 35 per 
cent of the respondents; 75 per cent read one or 
more of the trade publications in their own 
field such as Jron Age or Women’s Wear Daily. 
In addition they were likely to read one or 
more magazines of general business interest 
such as Business Week, Fortune, or Time. 
Frequently they resorted to digests and reports 
of longer books and articles. The subjects 
most widely pursued in books included: per- 
sonnel psychology, business management, 
economics, marketing, and accounting. 


The attitude of many was summarized in 
the comment of one of the executives: ‘There 
just isn’t time to read one-tenth of all I would 
like to read.”” One of the possible approaches 
to this problem is the training of executives in 
effective reading skills. With proper training, 
reading speed and comprehension can be in- 
creased resulting in a saving of time or a greater 
coverage of reading materials in the same 
amount of time. 


The Study 


Participants in the Study. Training in silent 
reading skills was given to over 150 executives 
in two industrial plants, two banks, a large 
department store, and a women’s specialty 
store. Less than 5 per cent of the participants 
were women. Data presented here are based 
on the cases on whom data were sufficiently 
complete to warrant statistical treatment. 

The group ranged in age from 22 to 65 years. 
None could be classified as seriously deficient 
in any basic reading skill. Illustrative posi- 
tions held by individuals in the group were: 
vice presidents of banks, merchandising man- 
agers, engineering supervisors, a personnel 
manager, comptroller, and industrial relations 
manager. Election of the course and atten- 
dance were voluntary. 

Design of the Training Course. Meetings 
were held once a week for 10 consecutive weeks. 
Each period was 1} hours in length. At the 
beginning and end of the course, testing took 
up a considerable amount of time. Regularly, 
however, each period was planned to include: 
(1) a reading speed check; (2) a topic for dis- 
cussion such as “concentration,” ‘‘vocabu- 
lary building,” “skimming,” “reading for 
different purposes” etc.; and (3) a pacing 
exercise, i.e., one or two of the Harvard Read- 
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Table 1 


Comparison of Measured Reading Performances of Ist with 10th Class Meeting 








Before Training 
(1st Meeting) 


Mean 
Reading Rate 
(Speed Check : 


Words per minute) 276.6 


Michigan Speed of 
Reading Test 
(Forms 1 and 2) 


51.9 


Nelson-Denny Vocabulary 
Test 
(Forms A and B) 


Nelson-Denny Paragraph 
Reading Test 
(Forms A and B) 
Nelson-Denny Total 
(Forms A and B) 


* Significant at the 5% level. 
** Significant at the 1% level. 


ing Films (4) followed by a comprehension 
check. 

Objective tests used (alternate forms for 
initial and final testing) included the Nelson- 
Denny Reading Test and the Michigan Speed 
of Reading Test. The Michigan Vocabulary 
Profile Test was also given so that the parti- 
cipants might gain some insight into their 
particular strengths and weaknesses in voca- 
bulary. Individual test results were made 
available to the participants at both the be- 
ginning and end of the course. 


Results 


Statistical analyses were conducted to evalu- 
ate the course in terms of reading gains as 
measured by standard reading tests. Table 1 
shows a comparison between “before” and 
“after” results on these tests. 

It will be noted that all critical ratios' were 
significant at the 1% level of confidence with 
the exception of the Nelson-Denny Vocabulary 


' The formula used to test the significance of differ- 
ence between before and after means was: 


ee ’ D M, — M 
Critical-Ratio = —* = ——————. 2 
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After Training 
(10th Meeting) 


Mean S.D. r 


Correlation, 
ist with 10th 


Critical 
Ratio 


439.9 18.05** 


56.2 


Test. These results suggest that statistically 
significant improvements in some reading 
skills took place as a result of the reading 
course. Large gains in reading rate are shown. 
Follow-up studies would be of value as a check 
on the permanency of gains. 

Inspection of the dispersion of before and 
after measures suggested further analysis of 
gains. The reader will note that the before 
and after dispersions on the standardized tests 
show only slight differences. On the Reading 
Rate variable, however, the standard deviation 
increased considerably at the end of training. 
This increase in dispersion may be attributed 
to differential effects of the training on various 
individuals. It was found that increase in 
reading speed (words per minute) correlated 
— .32 with initial reading rate. In other words 
there was a slight tendency for the slower 
readers to gain more in terms of. increased 
speed than those who began the course at a 
faster rate. It was also found that increase 
in reading speed correlated —.41 with age. 
This suggests a reasonable interpretation that 
the younger trainees tend to gain more from 
the course. 
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Table 2 


Percentile Equivalents for Mean Scores on Reading 
Tests Before and After Training * 





Percentile 
After 


Percentile 
Test Before 
Michigan Speed of Reading 
Nelson-Denny Vocabulary 
Nelson-Denny Paragraph Reading 
Nelson-Denny Total 


_ 


saws w& 
su 


* Compared with college seniors. 
I g 


In order to make the raw score results more 
meaningful, “before” and ‘‘after’’ means were 
converted to percentile equivalents in Table 2. 
In the absence of norms on business popula- 
tions, college senior norms were used. 

During each meeting of the course the parti- 
cipants were given a timed speed check on one 
of a series of standard 1,000 word reading 
exercises. Each person kept an_ individual 
graph of the words read per minute in these 
speed checks. Results on these reading exer- 
cises are shown in Table 3 for the 71 parti- 
cipants present at all meetings. 

Table 4 shows the intercorrelations (Pearson 
Product-Moment) between the various meas- 
ures of reading skill. Two separate matrices 
are given to indicate the intercorrelations 
between the tests before training and after 
training. 

In general, the intercorrelations tend to in- 
crease in the final testing. The low correla- 


Table 3 


Average Weekly Speed of Reading for 71 Business 
Executives during a Ten-Week 
Reading Course 


Standard 


Meeting Mean Deviation 


276.6 
290.0 
338.6 
369.9 
387.5 
396.7 
388.6 
415.7 
425.5 
440.0 


70.5 
69.9 
57.9 
67.9 
81.0 
75.4 
87.5 
81.2 
90.2 


Table 4 


Intercorrelation Matrix of Reading Tests Administered 
to 61 Business Executives * 


1 2 3 





Reading Rate 

(Speed Check) a uae 

Michigan Speed 

of Reading 19 16 

Nelson-Denny 

Vocabulary OO 5 : .90 

Nelson-Denny Para- 

graph Reading 13 5 53 81 

Nelson-Denny 

Total Score . oe 85 = .78 — 

* Coefficients on the left of the diagonal are the inter- 
correlations between tests administered during the first 
meeting of the course. Coefficients on the right of the 
diagonal are the intercorrelations for alternate forms of 
the same tests administered during the tenth meeting 


tions between Reading Rate and the Michigan 
Speed of Reading Test would suggest that 
different aspects of reading speed are being 
measured by the two tests. 


Discussion 


At the beginning of training the executive 
group averaged approximately 275 words per 
minute. They scored on the Michigan Speed 
of Reading Test at the 26th percentile for 
college seniors. This suggests that the group 
was initially rather poor in speed of reading. 
Vocabulary, on the other hand, as measured 
by the Nelson-Denny Test was at the 75th 
percentile for college seniors at the beginning 
of the course. Gains in rate were greater and 
more significant than vocabulary gains as 
measured (Table 1). It might be noted that 
at the first meeting 51, or 71.8 per cent of the 
group read at a rate below 300 words per min- 
ute. At the end of training only 1, or 1.4 per 
cent of the group read below 300 words per 
minute. 

Vocabulary training probably requires a 
great deal more time and effort than was 
possible in this course. The gains in voca- 
bulary reported in Table 1 and 2 may only 
reflect the participants’ increased speed in 
reading the test items. A control group might 
have shown as much gain. y 
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The causes of slow reading among these com- 
petent adults are probably complex, but some 
of them can be suggested: 


1. Carry-over from early childhood of oral 
reading habits, word-for-word rather than 
phrase reading: ‘“‘hearing’” each word when 
reading silently rather than apprehending con- 
nected phrases directly. 

2. Over-cautious approach to printed matter 
because of fear of losing “something important’’: 
there is a tendency to note carefully every word 
even when the material does not warrant such 
close scrutiny. Many executives apparently 
had never developed the useful arts of selective 
reading or skimming. The use of key words 
in extracting the most meaning with the least 
effort was apparently not a part of the basic 
reading equipment of most of the executive 
group. 

3. Difficulty in concentration and remember- 
ing: the length of time required to put words 
together into meaningful ideas is perhaps often 
so arduous and time-consuming that material 
presented in the first part of a paragraph is 
“forgotten” before the end of the paragraph 
is reached. Under these circumstances it is 
hard for an individual to maintain interest, 
to concentrate. 

4. Persistence of reading patterns related to 
particular job duties: engineers, accountants 
and similarly trained executives are likely to 
persist in the habit of giving close attention to 
each printed symbol. This over-emphasis on 
the individual symbol is carried over to more 
general reading in books and newspapers. 


Findings of other workers in the area of 
adult reading are similar to those reported here. 
Broxson (1) and Buswell (3) have found that 
the average adult tends to read more slowly 
than is necessary and also tends to read all 
types of material at about the same rate with- 
out regard for level of difficulty. Most in- 


vestigators have found their samples of adults 
averaging about 300 words per minute. 


Summary 

A course in reading efficiency was conducted 
in the Detroit area for several groups of busi- 
ness and industrial executives. Results of 
standard reading tests administered at the 
beginning and end of the course showed slight 
but statistically significant gains in rate of 
reading and accuracy of comprehension: The 
before and after gains in reading rate on non- 
test materials, without checks on compre- 
hension, were quite large. The findings sug- 
gest that business executives can significantly 
increase their reading rate on practice exercises. 


Received March 2, 1951. 
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Validation of a Correspondence Aptitude Test 


Philip H. Kriedt 


Prudential Insurance Company, Newark, New Jersey 


In the validation of industrial tests, it is fre- 
quently difficult to find a single criterion 
which is completely satisfactory. Sometimes 
it seems advisable to use a number of criteria 
in evaluating a test’s effectiveness. This is 
especially true if the validation study involves 
a fairly small sample of individuals and if the 
key for scoring the test is being developed 
empirically using this sample. The following 
report illustrates a validation study of this 
nature in which a variety of criteria were used. 

In order to develop an aptitude test battery 
to predict success in the job of correspondence 
clerk, a study was made of 200 such clerks in 
the Home Office of The Prudential Insurance 
Company. These clerks conduct correspon- 
dence with policyholders, district managers, 
and others. Although their duties are roughly 
similar, their assignments vary considerably 
with respect to job levels as assigned by Job 
Evaluation. Correspondence clerks with high 
job levels conduct correspondence regarding 
complex problems while clerks on lower levels 
handle more routine correspondence. 

In developing a selection battery for this 
job, it seemed desirable to attempt the con- 
struction of a test which would measure in a 
fairly direct manner aptitude for writing clear 
and tactful business letters. A preliminary 
form of such a test was made by describing 
48 business situations requiring a letter to be 
written. For each situation three short para- 
graphs differing in clarity, brevity, and tone 
were prepared that might have been included 
in such a letter. Individuals taking the test 
were asked to select the paragraph that they 
thought most people would prefer to receive 
and also the paragraph they thought most 
people would least like to receive. 


Selection of Criteria 
A combination of the following three differ- 
ent criteria was used in keying this test: (1) 
Ratings of clerks by supervisors; (2) Job 


level as an indirect measure of ability of clerks; 
and (3) Preferences of a group of employees 
selected to be similar in their distribution of 
age, sex, and education to policyholders and 
others who commonly receive Company corre- 
spondence. These employees were asked to 
indicate the paragraphs they themselves would 
prefer to receive and least like to receive. It 
was not practicable to sample the preferences 
of persons who actually correspond with the 
Company. : 

None of these criteria appeared to be ade- 
quate as a single criterion. Supervisory 
ratings probably reflect quite accurately the 
ability of an individual to give correct informa- 
tion in answer to questions and the ability to 
conduct a desirable amount of correspondence, 
but it is not at all certain that supervisors 
themselves know what kinds of letters are 
considered clear and tactful by those who 
receive them; consequently, supervisors may 
not have evaluated this ability properly in 
making their ratings. Job level is greatly 
affected by sex (most women being in lower 
level jobs) and length of experience as well as 
by ability. The employees selected to be re- 
presentative of policyholders and others are 
probably not a completely adequate sample 
to represent this group. 


Development of the Key 


The 200 correspondence clerks were divided 
into two groups of 100 each, matched for 
superviséry rating, sex, and job level. One 
group was used for developing a key and the 
other group for cross-validation purposes. 
The group on which the key was developed was 
divided into the high third and low third based 
on an arbitrary combination of supervisory 
rating and job level. 

A key was developed by two successive 
response analyses. First, responses were se- 
lected which showed at least a 12 per cent 
difference in the frequency with which they 
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were chosen by the high and low criterion 
groups. The 12 per cent figure was an arbi- 
trary one suggested by several empirical studies 
on the development of interest questionnaire 
keys. The writer knows of no practicable 
non-arbitrary method of selecting responses 
for such a key. 

Next, each response which met this first 
requirement had to meet a second requirement. 
The responses of the high ability group had to 
be more like the responses of the ‘‘policy- 
holder” group than were the responses of the 
low ability clerks. For instance, if Paragraph 
A were ranked third by at least 12 per cent 
more high ability clerks than low ability clerks, 
then the rank of 3 had to be the most frequent 
response for the ‘‘policyholder” group in order 
for this response to be keyed. 

Eighty-eight of the 172 responses which 
met the first requirement also met the second 
requirement, and these 88 responses, scored 
either +1 or —1, make up the key for the test. 
Since only half the responses which passed 
requirement 1 also passed requirement 2, it 
would seem that the key was developed on 
two quite unrelated criteria: a combined mea- 
sure of supervisory rating and job level, and 
the preferences of a group representative of 
policyholders. 


Estimates of Validity 
It seemed desirable to estimate the validity 
of this key in a number of different ways. 


1. For the cross-validation group of 100 
clerks not used in developing the key, validity 
coefficients were obtained using supervisory 
ratings, job level, and a combination of ratings 
and job level. The following product moment 
correlations were obtained: (a) supervisory 
ratings, .38; (b) job level, .30; and (c) ratings 
and job level combined, 41. (For the 100 
clerks used in developing the key, the compar- 
able validity coefficients were: (a) 49, (b) .33, 
and (c) .57.) 


2. The test was given to seven correspon- 
dence experts in a correspondence improve- 


ment section of the Company. Their average 
score was .5 standard deviations above the 
average for the 100 correspondence clerks in 
the cross-validation group. This difference 
is significant at the .05 level. 


ferable to cool and disinterested ones. 


3. Eight research technicians in the Per- 
sonnel Research Division independently made 
an analysis of the scoring key to discover what 
kinds of paragraphs seem to be favored or 
penalized by the key. This analysis showed 
that the key, although it appears to be a fairly 
subtle one, strongly favors the following three 
types of replies: 


a. Cordial and friendly paragraphs are pre- 
For in- 
stance: 


“Thank you for the interest you have ex- 
pressed in our recent campaign for new 
savings accounts. Unfortunately, however, 
our small statf is just plain ‘too busy’ to get 
together the materials you ask for. Perhaps 
we will be able to help you later on.” 


is better than 


“I hope we can take care of requests like 
yours at some future time. At present, 
however, we are much too busy to do so.” 


b. Customer-centered replies are preferable 
to Company-centered ones. For instance: 


“T am sure you are constantly looking for 
ways to improve the product you sell.” 


is better than 


“T would very much like to make an appoint- 
ment to talk to you about the many advan- 
tages of our flour.” 


c. Positive replies are preferable to replies 
which have a negalive emphasis. For instance: 


“We have been so rushed in recent weeks 
that our service to customers has necessarily 
suffered. You may be sure that we will 
soon get back to our old schedule and that 
you will be able to count on prompt service 
once again.” 


is better than 


“Tam very sorry that you have found it 
necessary to make a complaint about our 
service. However; lam sure your complaint 
is justified as we have not been able to main- 
tain our usual standards of service lately.”’ 


The key also favors, but not quite so strongly, 
replies which make better use of praise, give 
reasons for an action or condition rather than 
merely stating it, assume the blame rather 
than placing it on the customer, and are less 
awkward and stereotyped. 
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This content analysis of the key indicated 
that the test is in general agreement with 
frequently accepted principles of good letter 
writing. 

Summary 


In general, it seems that a variety of in- 
dependent methods of evaluation all indicate 
that this test probably has a moderate rela- 
tionship with the ability to write clear and 
tactful correspondence and also with success 
as a correspondence clerk. The test is related 


to supervisory ratings of correspondence ability 


and to job level for a group of clerks not used 
in developing the key. The key, by virtue of 
the manner in which it was constructed, is 
related to the personal preferences of a group 
of employees selected so as to be representative 
of policyholders and others who receive Com- 
pany correspondence. A small group of corres- 
pondence experts score significantly higher 
than correspondence clerks on the test. 
Finally, a content analysis of the key indicates 
that it is in agreement with common principles 
of good letter writing. 


Received A pril 23, 1951 





Prediction of Department Store Sales Performance from Personal Data * 


James N. Mosel 
The George Washington University 


While extensively used in other fields of 
selling, the personal data blank has received 
little attention in the selection of department 
store sales personnel. Early studies of a few 
characteristics were made by Kitson (2) and 
Anderson (1) ‘using various sales indices as 
criteria, but these for the most part were only 
exploratory. More recently, Stead (3) found 
that of 10 different predictors, personal data 
gave the highest correlation with a composite 
of eight non-clerical criteria. Similarly, Stead, 
Shartle et al. (4) report that a personal data 
blank composed of 10 items gave in three 
different samples coefficients of .29, 19, and 
.23 with a composite criterion of various meas- 
ures of sales success. It is also interesting 
to note that of the other predicators tried, per- 
sonal data correlated highest with a personality 
inventory (.30), while correlations with ability 
measures were all of a very low order and nega- 
tive in sign. This finding has been consist- 
ently reported in other sales fields of prediction 
from personal data. 


The Problem 


In an attempt to improve selection pro- 
cedures in a large department store, a study 
was undertaken to determine whether personal 
data might prove predictive of the sales per- 
formance of women sales clerks. The store 
employs about 4,500 persons in all, and does 
an annual business of approximately 40 million 
dollars. An earlier investigation in the same 
company had shown that personal data could 
satisfactorily predict job tenure. The cri- 
terion of sales performance, however, remained 
to be examined. A system of differential pre- 
diction against several criteria would provide 
a broader basis for the appraisal of applicants, 
and would greatly increase the flexibility with 
which selections could be adjusted to specific 
store needs. 


* The writer wishes to express his gratitude to Mr. 
Richard R. Wade for collection of the data and his part 
in the analysis. 


Procedures 


Criterion. As a criterion of sales _per- 
formance, “selling cost per cent’’ was chosen. 
This index is computed for each employee by 
dividing total selling cost (salary and commis- 
sions) by total net sales (dollar value of actual 
sales).' These ratings are essentially a meas- 
ure of the dollar worth of an employee to the 
company, and are computed semiannually 
by departments. The index has been worked 
out by the store in conjunction with the Na- 
tional Retail Dry Goods Association which 
furnishes member stores with typical selling 
cost per cents for similar stores and goals for 
each department. 

As a measure of job performance, selling 
cost per cent contains an obvious source 
of contamination. Interdepartmental differ- 
ences in the price of goods and consumer de- 
mand create inequalities in the opportunity 
to earn net sales. Thus an employee’s selling 
cost would to some extent be a function of the 
department to which she was assigned. A 
further source of contamination lies in the 
fact that there was a relationship between 
time on the job and selling cost. Previous 
analysis had shown that up to about six 
months there is a steady average decrease in 
selling cost. While the company adjusts for 
these factors “clinically” in evaluating em- 
ployees, it was necessary in the present study 
to partial out these biases before selling cost 
per cent could be used for criterion purposes. 
This was achieved operationally by the method 
of selecting the criterion groups as described 
below. Despite the additional controls re- 
quired for its adoption as a criterion, selling 
cost per cent was felt to be a useful index in- 
asmuch as it forms the basis for most of the 
company’s personnel evaluations and actions.’ 


1 For more detailed explanation of these measures, 
see 4, pp. 80-81. 

2When used by the company to make_personnel 
decisions, these inequalities are adjusted for subjec- 
tivity. Four management representatives equate for 
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It seemed desirable that predictions be made 
in terms of it. 

In this connection it is interesting to note 
that Stead, Shartle et al. (4) report a reli- 
ability of .88 for the selling cost criterion, 
based on the correlation between the first and 
and second half of a year’s sales data. When 
corrected by the Spearman-Brown formula 
this yielded an estimated reliability for one 
year’s data of .94. They also found selling 
cost to correlate —.15 with supervisors’ rat- 
ings, indicating clearly that the two measures 
are reflecting different aspects of department 
store selling. The negative sign in this coeffi- 
cient does not mean that sales performance 
and ratings are negatively related, but simply 
reflects the nature of the selling cost per cent 
ratio:*the higher the selling cost, the poorer 
the performance. 

Criterion Groups. For purposes of item 
analysis it was desired to obtain an upper and 
lower criterion group representing the extremes 
in sales performance. These contrasting 
groups were secured by consulting the com- 
pany’s application files for the 1948 fiscal year 
and selecting from each of the 85 departments 
the sales clerk who had subsequently achieved 
the highest and the sales clerk who had 
achieved the lowest selling cost per cent. In 
this way it was possible to obtain an upper 
and lower criterion group, each containing 
85 sales women, and roughly equated on inter- 
departmental differences in selling opportunity. 
This procedure had a further advantage in that 
it insured representation of all departments, 
a feature which was highly desirable since it 
was important to produce an instrument 
having applicability to all departments. 
Furthermore, only those employees who had 
been on the job for at least six months were 
chosen for study, thus reducing the effect of 
experience. 

But despite these controls, the criterion 
groups were probably not so contrasting as 
they might appear. There was some evidence 
of a relationship between department and 
caliber of personnel. Through transfers and 
placements the better sales persons may have 
tended to gravitate toward the more critical 
sales volume and make allowances for seasonal fluctua- 


tions and department volume. The reliability and 
validity of such adjustments are not known. 


departments, leaving the less capable to ac- 
cumulate in others. Consequently, the best 
employee in an inferior department might 
actually be mediocre on an absolute basis, 
while the low employee in a superior depart- 
ment might actually be quite good. If this 
effect was in force, and management con- 
tended that it was, there would be a displace- 
ment of both groups toward the middle and a 
consequent reduction in their discriminability. 
This would impose a restriction upon the dis- 
crimination power of the personal data when 
subjected to item analysis. Results, however, 
would be strengthened by this handicap, since 
any differentiations obtained would be under- 
estimates. 

Item Analysis. The Chi-square test was 
applied to the category frequencies of each 
item to determine whether the high and low 
selling cost responses could represent a homo- 
geneous population. Of the 42 items of in- 
formation submitted by the applicant at the 
time of employment, 12 proved to distinguish 
between the two groups at the .05 confidence 
level. These were: age, years of formal educa- 
tion, years of previous selling experience, 
weight, height, time on last job, time on next 
to last job, domicile, type of principal experi- 
ence, number of dependents, marital status, 
and time lost on job in last two years. 

The response categories for each item were 
assigned weights by the “vertical per cent 
method”’ (4, pp. 253-5). This method weights 
each category according to the difference in 
per cent of high and low selling cost employees 
making the response. The per cent differences 
were reduced to simple integral weights by 
Strong’s Table of Net Weights (5), and were 
simplified further by adding a constant to 
eliminate negative signs. 


Results 

From’ the response categories characteristic 
of low selling cost employees the following 
composite description emerges of the “‘ideal’’ 
low selling cost sales woman (in order of dis- 
crimination): between 35 and 54 years of age, 
13 to 16 years formal education, over five years 
previous selling experience, over 160 pounds, 
five years or less on next to last job, lives in 
boarding house, over five years on last job, 
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minor executive as principal previous experi- 
ence, between 59 and 62 inches in height, one 
to three dependents, widowed, and no lost 
time in last two years. 

To obtain a cross-validation check on the 
scoring key, total scores were computed for 
another sample of 100 present employees. 
Half of this group was drawn from low selling 
cost sales clerks in various departments; the 
other half from the high selling cost employees. 
Table 1 shows the distribution of personal data 
scores for the two groups. There is appreci- 
able discrimination between the two groups as 
evidenced by the clustering of the scores at 
different parts of the score range. The differ- 
ence between the mean scores of the two groups 
is highly significant, the critical ratio being 
5.69. 

Table 2 shows the per cent of high and low 
selling cost employees scoring at and above 
various cutting scores. Thus if a cutoff of 30 
had been adopted at the time of hiring these 
employees, 92 per cent of the low selling cost 
sales clerks would have been retained, while 
56 per cent of the high selling cost group would 
have been rejected. Table 2 also shows the 
per cent of those selected at each cutoff who 
would have been of low selling cost. If the 
cutoff had been set at 30 when the total group 
originally applied for employment, 68 of the 
100 applicants would have been selected, of 


Table 1 


Distribution of Total Personal Data Scores of 50 Low 
and 50 High Selling Cost Women Sales Clerks 


High Selling 
Cost Group 


Personal Data 
Scores 


Low Selling 


Cost Group Total 


50-54 1 1 
45-49 1 1 
40-44 13 : 

35-39 13 24 
30-34 18 26 
25-29 3 14 
20-24 1 12 
15-19 ‘ 3 
10-14 


Total 
Mean 


Table 2 


Per Cent of Low and High Selling Cost Sales Clerks 
Scoring at and above Various Cutting Scores, 
and Per Cent Selected at Each Cutting 
Score Who are of Low Selling Cost 








Per Cent 
Selected 
Who are 
of Low 


Selling 


Per Cent 
Low 


Per Cent 

Number High 

Accepted Selling Selling 
(Total Cost Cost 


Group) Accepted Accepted 


Cutting 


Score Cost 


50 1 2 0 100 
45 2 4 0 100 
18 30 6 83 
35 42 56 28 66 
30 68 92 AW 68 
25 82 98 66 60 
94 88 53 
15 97 100 94 51 
100 100 





which 68 per cent would have proved to be of 
low selling cost. 
Summary 

Analysis of the application blanks of 170 
women department store sales clerks revealed 
that 12 personal data items significantly dis- 
tinguished between high and low seliing cost 
employees. When applied to 100 present em- 
ployees, total weighted personal data scores 
showed a substantial relationship to selling 
performance. 

These results are in accordance with other 
findings on department store sales personnel, 
i.e., personality, personal situation and rel- 
evant experience are useful predictors of job 
success. 


Received March 9, 1951. 
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Prediction of Academic Success in Dental School 


Irving Weiss 


University of Kansas City 


For the past four years the School of Den- 
tistry at the University of Kansas City has 
participated in a nation-wide testing program 
conducted by the Council on Dental Educa- 
tion of the American Dental Association (8). 
One aspect of this testing program is an at- 
tempt to predict success or failure of dental 
students as measured by their grade point 
average in dental school. Since 1946 every 
Freshman class has been given a battery of 
tests shortly after the beginning of the Fall 
semester. 


Test Battery 


The tests in this battery follow the two 
major divisions of the dental curriculum into 
theory and technic work. Theory courses 
cover the basic scientific and theoretical 
groundwork of dentistry, while technic courses 
include the manual, mechanical and clinical 
aspects. Courses in anatomy, oral pathology, 
and orthodontics are indicative of the former, 
while prosthetics, crown and bridge, operative 
and clinical dentistry are representative of the 
latter. Correlations between theory and tech- 
nic grade point averages have been generally 
found to be sufficiently low to encourage the 
development of separate indices for their pre- 
diction. Bellows (1) found a correlation of 
.38 between theory and technic GPA, while 
Wagner (11) at the University of Pittsburg 
found this correlation to be .21. This correla- 
tion was found to range from .30 to .51 for the 
three classes in the present study. 

For the class of 1946, this correlation was 
.30, .51, and .41, respectively, for their Fresh- 
man, Sophomore, and Junior years while for 
the class of 1947, their correlation was .34 and 
.47 for their Freshman and Sophomore years, 
respectively. The Freshman class of 1948 had 
a correlation of .33 between theory and tech- 
nic GPA. 

The test battery as of 1949 consisted of the 
following: 


For the prediction of theory grades. 


1. An intelligence test at the college level, 
divided into a quantitative (Q) and a linguistic 
(L) section. Reliability is greater than .9, 

2. A science test prepared especially for the 
Council on Dental Education is an achieve- 
ment test in which the subtest scores are bio- 
logy, chemistry, physics, factual information, 
application of principles, and total. These 
last three scores are combinations of the first 
three. Reliability data have not yet been 
released. 

3. A test on the interpretation of reading 
materials in the natural sciences. This is 
approximately half of a college level test. No 
reliability data are available. 


For the prediction of technic grades. 


1. Paper and pencil test of object visual- 
ization in three dimensions. Reliability by 
the split half method is reported to be .91. 

2. A carving dexterity test consisting of a 
simple drawing which the student attempts 
to duplicate in chalk, using a carving tool and 
aruler. The accuracy of the carving is judged 
by a board in Chicago. The design is altered 
yearly. No data are available concerning 
test reliability. 


Pre-Dental College Achievement 


In addition to tests as predictors of dental 
grades, another predictive factor should be 
considered, namely, a measure of college 
achievement as indicated by pre-dental grade 
point average. As indicated by previous in- 
we the pre-dental grades are one of the 
better predictors of dental school grades. 
Freeman and Smith (3) found a correlation of 
49 between first year dental theory grade 
point average and pre-dental grade point 
average, and a correlation of .31 between first 
year technic criterion and pre-dental grades. 
Wagner (11) found correlations of .47 and .02 
for similar values. Douglas (2), McGrath (6), 
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Harris (5), and Graves (4) have found cor- 
relations between dental grade point average 
(theory and technic combined) and pre-dental 
grade point average ranging from .34 to .51. 
As indicated in Table 1, the results obtained 
in this study are in general agreement with the 
above findings. Correlation between theory 
and pre-dental GPA range from .40 to .54, and 
between technic and pre-dental GPA from .12 
to .21. It is interesting to note that the total 
pre-dental grade point average and the re- 
quired science grade point average are equi- 
potent predictors of dental school achievement 
for the 1946 and 1948 classes, and that for 
these two classes, differences between their 
correlation with dental theory could be as- 
cribed to random fluctuations. For the 1947 
class the correlation of theory and total pre- 
dental grade point average was significantly 
higher than similar correlations with the re- 
quired science grade point average. (Re- 
quired pre-dental science courses include a 
minimum of 28 hours of chemistry, physics or 
biology.) However, Wagner (11) found cor- 
relations of .48 between theory and required 
pre-dental grade point average (required 


science course and one course in English) as 


contrasted with a correlation of .22 between 
theory and elective pre-dental grade point 
average. As only ten to fifteen per cent of 
the dental students are graduates of the Univ- 
ersity of Kansas City, correlations that would 
circumvent the heterogeneous grading systems 
of pre-dental colleges could not be easily made. 


Intercorrelation of Factors 


The correlations of various factors with the 
criteria, grade point average in dental school, 
are given in Table 1. The first order correla- 
tion coefficients were tested for linearity of 
regression by the analysis of variance technique 
(7). At the .05 level, the deviations from 
linearity were not statistically significant. In 
addition, all coefficients greater than .20 are 
statistically different from zero at the .05 
level. With the exception of 1946, the test 
results used in the correlation computation 
were available only in single digit normalized 
scores which ranged from —2 to 9 for the 
thirty-nine participating dental schools. As 
the local data seldom ranged over seven in- 


tervals, Sheppard’s correction for broad group- 
ing was employed in calculating the first order 
correlations. The multiple correlation coeffi- 
cient was calculated by the Wherry-Doolittle 
method of test selection (10). 

These results indicate a moderate positive 
relationship between dental school theory 
grades and the predictor variables. In the 
prediction of theory grade point average, some 
form of pre-dental grade point average and 
sections of the science test are the most fruit- 
ful. Except for the class of 1947, differences 
in the correlation between the theory criterion 
and the pre-dental science grade point average, 
as contrasted with the correlations between 
the same criterion and the total pre-dental 
grade point average, can be ascribed to random 
errors in sampling. The chemistry and bio- 
logy sections of the Science test significantly 
increase one of the multiple correlation coeffi- 
cients. However, the higher correlations with 
the theory criterion of these sections of the 
science test as compared with the other sec- 
tions, may in some cases be due to chance 
errors in sampling. The addition of any 
further tests either adds more chance errors 
than actual validity to the test battery, or 
does not increase the size of the multiple cor- 
relation coefficient by any significant amount. 
The low correlation of Q and L scores with 
theory grades has also been found by Wagner 
(11). However, Peterson (9) has reported 
this correlation to be .56 for one dental study. 
It was hoped that this test could have been 
used as a suppressor variable, but this did not 
prove to be the case. The reading test showed 
a moderate correlation with theory grade point 
average, as well as a rather high correlation 
with pre-dental grade point average and the 
chemistry section of the science test. As a 
result of this correlation between the reading 
test and grade point average, the addition of 
the reading test to the test battery did not 
increase the size of the shrunken multiple 
correlation coefficient. 


Technic Criteria 


The correlations between technic criteria 
and the carving and object visualization tests 
were significantly lower than those between 
the theory criteria and several of the corres- 
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Table 1 


Correlation of Tests and Pre-dental Grade Point Average (GPA) with Dental Theory 
and Technic Grade Point Average (GPA) 





Class of 1946 
N = 106 
Junior GPAT 


Theory Technic 


Pre-dental GPA 
Pre-dental Science 
GPA (required) 
Science Test Bio. 
Chem. 
Phy. 
Fact. 
Appl. 
Tot. 
Reading Test 
(Q 
(L 
(T 


42 
45 


ao 
42 


32 
33 
35 
Obj. Vis. 09 
Carving Test 12 
Rais = .63; Roy = .52; and Ra ow = .56. 


Intelligence 


* Not given in 1946. , 


Class of 1948 
N = 97 
Freshman GPAT 


Class of 1947 
N = 102 
Sophomore GPAT 
Theory (b) Technic 
40 20 
44 | 


Theory (a) Technic 
54 
43 


34 
47 
.28 
40 
40 


32 
AT 
18 
46 
.28 
36 
40 
00 

18 


00 
14 
09 


02 


ll 
A3 


24 
16 


¢ Grade Point Average for any year is based on grades earned in that year only. 


ponding theory tests. In the junior and 
sophomore years in Table 1, the correlation 
figures for the object visualization test could 
be from populations whose parameter values 
are zero. Wagner (11) reports correlations of 
.14 and .22 with technic grades for two fresh- 
man classes (N is approximately 100 in each 
class). In the present study the carving test 
gave a uniformly moderate correlation with 
technic criteria ranging from .24 to .35. 
Wagner (11) found for the above mentioned 
classes correlations of ..30 and .43. 


Table 2 


Intercorrelation between Yearly Theory Grade Point 
Averages and Intercgrrelation between Yearly 
Technic Grade Point Averages * 





Class of 1946 Theory Technic 


81 
68 


Junior vs. Sophomore 
Junior vs. Freshman 

Class of 1947 
Sophomore vs. Freshman 72 





* Grade point averages for any year are based on 
grades earned in that year only. 


The results of Table 2 show a fairly high 
relationship between the various years of 
theory work. A prediction based on one 
year’s work in theory is likely to hold for the 
remaining three years of theory work. How- 
ever, the correlation between technic grade 
point averages for the various years is signifi- 
cantly less than similar intercorrelations based 
on theory grades with the result that pre- 
diction of technic grades based on one year of 
such work does not appear to be too reliable. 
This may be due to the greater heterogeneity 
in the technic curriculum. 


Screening of Dental Applicants 


The use of the test battery and pre-dental 
grades for the selection of future dental stu- 
dents presents several problems in addition 
to the magnitude of the correlation coefficients. 
By the analysis of variance techniques, the 
variation in the means of the test scores from 
year to year is such that the three or four 
dental classes (test results but not grades were 
available for the freshman class of 1949) can- 


not be considered samples from the same popu- 
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lation. Moreover, a comparison of similar 
intercorrelation coefficients among the several 
independent variables for 1947 and 1948 in- 
dicated that the differences could hardly be 
attributed to random sampling errors. The 
differences among the multiple correlation 
coefficients are also statistically significant. 
As a result, the assignment of relative weights 
to the independent variables for the selection 
of future dental students appears at the pre- 
sent time to be a rather approximate arrange- 
ment. 

In addition to the requirement of linearity 
of regression, the distribution of the criteria 
and the predictor variables should be at least 
approximately normal for predictive purposes. 
All the variables fulfill the requirements of 
linearity and normality with the exception of 
the technic criteria, which is linearly but not 
normally distributed. This lack of normality 
makes the prediction of technic work from any 
variable decidedly tenuous for the data in this 
study. 

The giving of the test after the students had 
been admitted to dental school may raise some 
questions as to the validity of these scores. 
It is conceivable that once in dental school, 


the motivation of the group may vary both as 
to level and diversity when compared to the 
situation in which the tests were used as a basis 


for admittance. In an attempt to determine 
the relationship between level of motivation 
and test scores, a questionnaire was randomly 
distributed to 20 per cent of the members of 
each class. This questionnaire asked the 
student to check one of the following: Tried 
vour hardest, tried hard, made an average 
effort, made a weak effort. The results were: 
73 per cent stated that they tried their hardest 
or at least hard; 24 per cent made an average 
attempt, and 3 per cent made a weak effort. 
No significant correlation or difference was 
found between level of endeavor and a com- 
posite of the test scores. Responses on the 
questionnaire, recorded twenty to forty months 
after the tests were administered, are open to 
the usual skepticism regarding this method of 
measuring motivation. It is felt that the 
administration of these tests in 1951 prior to 


admission to dental school could conceivably 
be an additional factor which cannot be ac- 
counted for at the present time. 


Summary 


The present study has found that a combin- 
ation of pre-dental grades and some sections 
of the science test will give moderately high 
correlations with dental theory grades. If 
used in the selection of dental school appli- 
cants, of whom 20 to 40 per cent are selected, 
these correlations will help in eliminating 
potentially poor theory students. The non- 
normal distribution of technic grades in this 
study prevents at present a similar use in the 
technic field. 


Received A pril 2, 1951. 
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Prediction in Auto Trade Courses * 


Isaac T. Littleton 
Chapel Hill, North Carolina 


Within the past 30 years interest in me- 
chanical aptitude and its measurement has in- 
creased tremendously. Psychologists have 
attempted to determine the factors that under- 
lie this aptitude, developed tests to measure 
some of them, and validated many of these 
tests for specific mechanical occupations. 

There has been little attempt, however, to 
compare the validities of tests which are de- 
signed to measure similar traits and to compare 
standard single tests with the subtests of 
mechanical aptitude batteries. 

This study was undertaken to compare the 
validities of two similar mechanical aptitude 
batteries with that of a combination of four 
selected single tests. Secondary aims were: 
(1) to compare the effectiveness of subtests 
and single tests, and their relative weights in 
the total batteries; and (2) to determine the 
values of the tests and batteries in predicting 
success in training in courses in Auto Me- 


chanics and Auto Body Repair and Painting in 
a technical trade school. 


The Tests 
The aptitude batteries were chosen on the 
basis of two criteria: (1) their suitability for 
the trade school students; and (2) the apparent 
similarity in factorial content of their subtests. 
The two batteries chosen were: 


1. The SRA Mechanical Aptitudes, Form 
AH, (7) which has three subtests: Mechanical 
Knowledge; Space Relations; and Shop Arith- 
metic. 

2. The California Prognostic Test of Mechan- 
ical Abilities, Form A, (8) which has five 
subtests: (1) Arithmetic Computation; (2) 
Reading Simple Drawings and Blueprints; 
(3) Identification and Use of Tools; (4) Spa- 
tial Relationships; and (5) Checking Measure- 
ments with a Ruler. Tests 1, 2 and 5 contain 
items similar to the SRA Shop Arithmetic 
subtest. Test 3 is similar to the Mechanical 


* Based on data from a thesis submitted to the Com- 
mittee on Graduate Studies of the University of 
Tennessee in partial fulfillment of the requirements for 

- the degree of Master of Arts. 


Knowledge subtest of the SRA battery, and 
test 4 is similar to the SRA Space Relations 
subtest. 

Two criteria were used also in selecting the 
single tests: (1) they should measure about 
the same abilities or factors which the subtests 
in the batteries measure; and (2) they should 
be widely used standard tests with known 
validities for mechanical occupations. The 
following tests were chosen: 

1. The Bennett Test of Mechanical Compre- 
hension, Form AA (1). This test was used to 
get at the same component as the Mechanical 
Knowledge subtest of the SRA battery and the 
tool usage test of the Prognostic Test of Mechan- 
ical Abilities. 

2. The Revised Minnesota Form Board, Series 
MA (5), was selected because it is similar 
to the spatial relations subtests in the two 
batteries. 

3. The Purdue Industrial Training Classifi- 
cation Test, Form A (4): It has items similar 
to those of the Shop Arithmetic subtest of the 
SRA battery and to those of subtests 1, 2 and 5 
of the: Prognostic Test of Mechanical Abilities. 

4. The O'Rourke Survey Test of Vocabulary, 
Form X4 (6). This test was included to pro- 
vide a check on the verbal intelligence of the 
subjects and to determine how much a vocabu 
lary test would add to the predictive power of 
the battery. 

The tests were administered in conformity 
with the instructions and time limits given in 
each test manual. The six tests were admin- 
istered over a period of two months—from 
July 21, 1949 to September 22, 1949, on an 
average of two weeks apart, in five different 
testing sessions. 


The Subjects and the Trade School 


The subjects in this investigation were 
students at the Knoxville Trade School in 
Knoxville, Tennessee. They were training 
for two automotive trades: namely, Auto 
Mechanics and Auto Body Repair and Painting. 
Due to the rapid turnover of students, and the 
rotation of instructors every three months 
from one class to another, a student, during 
his 18 months of training, was taught by 
several different instructors. It was, there- 
fore, possible to obtain more than one rating 
for each student. 
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The ages of the students ranged from 18 to 
50 years. Their education ranged from no 
schooling through 12 years. 

No student was included in the sample who 
had not been in training for at least two months 
when the criterion of success was obtained. 
The mean months of attendance for both 
groups, is about the same, 11.3 months for the 
Auto Mechanics and 11.2 months for the Auto 
Body Repairmen. 

Only students who were rated by two or 
more instructors were included. In the final 
sample, there were 85 subjects in the Auto 
Body Repair group and 105 in the Auto Mech- 
anics group. 

The Criterion 

The criterion of success in each course was 
obtained by having all instructors in the 
courses rank as many students as they knew 
well enough to rank fairly accurately, on the 
basis of the students’ over-all abilities to learn 
the trades. Eight Body Repair instructors 
ranked the Body Repair students, and nine 
Auto Mechanics instructors ranked the Auto 
Mechanics students. Each instructor ranked 
a different number as well as a different group 
of students. Each student received at least 
two ratings and some as many as six. 

With each instructor ranking a different 
number and a different group of students, it 
was necessary to give each set of rankings an 
equivalent meaning by placing them on a 
common scale. A table for conversion of 
ranks into normalized scores prepared by 
Larson (3, 491-492) was used for this purpose. 
Each rank was converted into a linear score 
on a 100-point scale. The scaled scores con- 
verted from the rankings for each trainee were 
averaged to obtain the final criterion scores. 
The mean of the criterion scores for the Auto 
Mechanics group was 49.4; while that for the 
Auto Body Repair group was 49.9. The 
standard deviation of the criterion scores for 
the Auto Mechanics was 14.8; while that for 
the Auto Body Repairmen was 16.2. 

The reliability coefficients of both sets of 
criterion scores (criterion reliability) were com- 
puted by a formula developed by Cureton (2). 
The reliability of the Auto Mechanics scores 
was .62; that of the Auto Body Repair and 
Painting scores was .82. 


Effect of Training 


Since there was a range of from 2 to 17 
months of attendance in the courses, it was 
felt that this factor of length of training may 
have affected the instructors’ judgments of the 
students’ abilities, and the performances of the 
students on the tests. For that reason, the 
number of months of attendance was correlated 
with the criterion scores and the scores on the 
tests. 

The only test that correlated significantly 
with months of attendance was the Bennelt 
Test of Mechanical Comprehension for the Auto 
Mechanics. This correlation coefficient was 
.28, significant at the one per cent level. 

Partial correlations were computed for all 
inter-test and criterion-test correlations, with 
months of attendance held constant. All the 
differences between the partial correlations 
and the zero-order correlations were less than 
.03. The standard error of a correlation of 
zero for the Auto Mechanics group is .11, and 
for the Auto Body Repairmen, .10. Because 
of the small differences between the zero- 
order and partial correlation coefficients, it 
was concluded that the effect of months of 
attendance on inter-test and criterion-test 
correlations was negligible. Even the signifi- 
cant correlation between months of attendance 
and score on the Bennett became insignificant 
in the partial correlations involving this test. 


Statistical Procedure 


The two training groups were treated separ- 
ately in the statistical analysis. 


1. The subtests of each battery were cor- 
related with the other subtests in the battery 
and with the criterion. 

2. Each single test was correlated with each 
of the other single tests and with the criterion. 

3. Beta weights for each subtest and each 
single test were computed by the Doolittle 
Method. 

4, Multiple correlation coefficients for each 
battery and for the combination of single tests 
were computed. 

5. The criterion-test correlations were cor- 
rected for criterion attenuation, using the 
criterion reliability coefficients reported above. 

6. The six multiple correlations were re- 
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Table 1 


Multiple Correlations and Bennett-Criterion 
Correlations 


Auto Body 
_ Repair 


Auto Mechanics 


Cc semcted Cc cemeted 
for for 
Attenu Attenu 

ation Raw ation 
SRA 52 48 52 
Prognostic Test of 

Mechanical Ability. 58 49 54 
Single Tests F 62 Si 59 
Bennett 62 r a 


worked to determine multiple correlation 
coefficients corrected for criterion attenuation. 


Results 


The basic findings are reported in Tables 
1, 2,3,and 4. Only the criterion-test correla- 
tion coefficients corrected for attenuation are 
reported here. The zero-order correlations 
are reported in the author’s thesis, which is on 
file at the University of Tennessee. 


1. For both groups, the highest correlation 
between any single test or subtest and the 
criterion was given by the Bennett Test of 


Mechanical Comprehension. These correla- 
tions were higher than the multiple correlation 
coefficients of either of the two standard 
batteries. The other three single tests, maxi- 
mally weighted, did not increase significantly 
the multiple correlation coefficients of the 
single tests over the Bennett-criterion correla- 
tion, for either training group. 

2. All three batteries were significantly 
correlated with the criterion scores, in both 
groups, when each subtest or single test was 
maximally weighted. 

3. The combination of single tests gave the 
highest multiple correlation coefficients, with 
the California Prognostic Test of Mechanical 
Abilities second, and the SRA Mechanical 
Aplitudes, third. This order prevailed for 
both training groups. 

4. Of all the single tests, the vocabulary 
test contributed least to the multiple correla- 
tions in both groups. This would indicate 
that little may be gained by including a voca- 
bulary test in a battery for predicting this 
criterion. 

5. The multiple correlation coefficients ob- 
tained by using the criterion-test correlations 
after they were corrected for attenuation were 
raised considerably in all cases, thus yielding 
validity , coefficients, as distinct from coeffi- 
cients of predictive power. 


Table 2 


Single Tests t 
Intercorrelations and Criterion Correlations of Four Single Tests, with Means 


Minnesota 
Bennett Form 
i (AA) Board 
Bennett (AA) .53 
Minnesota Form Board 
O’Rourke Vocabulary 5 49 
Purdue Industrial Training 5 53 
Criterion .42* 
31.3 
12.3 
D> 


Mean 
S.D. 
Beta 


* Corrected ei criterion attenuation. 


¢ The figures in the upper right part of the table are for Auto — trainees (N = 


O’ Rourke 
Vocabu- 


Standard Deviations, and Beta Weights 


Purdue 
Industrial 
Training $.D. 
46 Al .62* 5. 6.5 
46 . aa” 10.0 
.29* 11.0 
.29* : 4.9 
14.8 
49.9 u = .62* tt 
16.2 = .59* tf 


lary Criterion 


105). Those in the 


lower left part are for Auto Body Repair and Painting trainees (N = 
tt Ru means the multiple correlation coefficient for the Auto Mechanics trainee group; Rg means the multiple 
correlation coefficient for the Auto Body Repair and Painting trainee group. 
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Table 3 


SRA Mechanical Aptitudes ¢ 


Intercorrelations and Criterion Correlations of the Subtests of the SRA Mechanical Aptitudes Battery, 
with Means, Standard tdameninch and Beta _ 








Mechanical 


Space 
Knowledge 


Relations 
44 

51 — 

16 .26 

44* .38* 


25.3 16.4 
8.6 5.8 
aa .16* 


Shop 
Arithmetic 

39 

.34 


32 


6.9 
4.4 
a 


M 
29.2 


Criterion 
50" 
.34* 
27 


Mechanical Knowledge — 
Space Relations 

Shop Arithmetic 

Criterion 


78 





Mean 
S.D. 
Beta 

* Corrected for attenuation. 

+ The figures in the upper right part of the table are for Auto Mechanic trainees (N = 
lower left part are for the Auto Body Repair and Painting trainees (N = 85). 

t Ku means the multiple correlation coefficient for the Auto Mechanics trainee group; Rg means the multiple 
correlation coefficient for the Auto Body Repair and Painting trainee group. 


105). Those in the 


Table 4 


Prognostic Test of Mechanical Abilities t 
Intercorrelations and Criterion Correlations of the Subtests of the Prognostic Test of Mechanical Abilities, 
with Means, Standard Deviations, and Beta W ight 





Measure- 
ment 
Space witha 
_Relations Ruler 


.23 
07 
19 


Tool 
Usage 
42 
A 


Arith- Blueprint 
‘metic Reading 


30 


Criterion 
.29* 
45* 
.39* 
.03* 
.50* 

.18* —- 


7.2 


Arithmetic 
Blueprint Reading 
Tool Usage 


34 
44 
44 
04 


59 
57 AT 
39 42 
.28* A7* 
16.2 7.2 
2.4 4.3 
40* —.08* 


Space Relations 
Measurement witha Ruler 
Criterion 

Mean 

S.D. 

Beta 





6.3 
2.9 
— .26* 


49.9 
16.2 


Ry = :! 
Ra 


* Corrected for attenuation. 

+ The figures in the upper right part of the table are for Auto Mechanics trainees (N = 105). 
lower left part are for the Auto Body Repair and Painting trainees (N = 85). 

tt Ru means multiple correlation coefficient for the Auto Mechanics trainee group; Rp means the multiple 
correlation coefficient for the Auto Body Repair and Painting trainee group. 


Those in the 


"6. After the unreliabilities of the two sets the combination of single tests yielded slightly 


of criterion scores were removed by correcting 
for attenuation, the batteries and single tests 
predicted about equally well for both trade 
groups. There is a difference of only .003 
in the SRA multiple correlations for the two 
trades. However, the Prognostic Test of 
Mechanical Abilities, the Bennett test, and 


higher correlations for the Auto Mechanics 
group than for the Auto Body Repair group. 

7. The evidence is not conclusive enough 
to make positive statements about the rela- 
tive values of the various tests and components 
in the prediction of the criterion. A compari- 
son of the correlation coefficients will give 
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indications of these values, however. These 
data tend to confirm other findings that 
mechanical knowledge is an extremely im- 
portant component in mechanical aptitude. 
The space relations tests also yielded some- 
what higher correlations than did the arith- 
metic and blueprint reading tests. 


Received March 5, 1951. 
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Accuracy of Students’ Reported Honor Point Averages * 


Marvin D. Dunnette 


Industrial Relations Center, University of Minnesota 


The scholastic achievement of university 
students is most often denoted in terms of a 
derived index or honor point average. In 
many counseling situations, a student’s trans- 
cript of courses is not immediately available 
and reliance must be placed on his own report 
of his honor point average. The accuracy of 
this report depends not only on the degree of 
conscious or unconscious distortion which he 
may introduce into his statement, but also 
on the degree to which he actually knows the 
true value of his index of achievement. The 
present analysis was undertaken in an effort to 
determine the amount of confidence which 
may be placed in such reports. 


Previous Studies 


Previous studies (1, 2, 3, 4) have been con- 
cerned with the validity and reliability of 
information gained in a variety of situations 
involving reports by students and adults 
concerning scholastic standing, work histories, 
age, previous earnings, etc. 

Vaughn. and Reynolds (4) were interested 
in determining the reliability of public opinion 
interviewers’ reports of age, education, and 
socio-economic level. The results of their 
study indicate that interviewer ratings of 
socio-economic level are much less reliable 
than those for age and education. The re- 
liability .of reported age and education was 
gratifyingly high. 

Keating, Paterson, and Stone (1) were 
directly concerned with the validity of work 
history information as reported by unemployed 
persons. The study was conducted in a guid- 
ance setting, and presumably there was little 
incentive toward distortion of facts. Validity 
indices were obtained which ranged from +.90 


* This study was completed while the author was 
Teaching Assistant in the Institute of Technology, 
University of Minnesota, 1950-51. It was a portion 
of a larger study designed to predict aptitude for 
graduate work in engineering conducted in partial ful- 
fillment of M.A. requirements in personnel psychology. 
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to +.98 for such details of work history as 
wages, duration of job, and job duties. It is 
evident that these workers were well aware of 
the factual details of previous jobs and that 
they reported them with high accuracy. 

Krueger’s study (2) is related to the falsiti- 
cation of test scores among college students. 
When grading errors were made in students’ 
test papers, only 10 per cent of the students 
whose papers were graded too high reported 
the discrepancy, whereas 99 per cent of stu- 
dents graded too low discovered and reported 
the errors. The evidence is decisive that 
“self-interest” dictates the accuracy with 
which such errors are reported. 

Paterson and Thornburg (3) checked the 
accuracy with which ‘entering college of engi- 
neering freshmen reported their scholastic 
standing with reference to whether they were 
in the top third, middle third, or lower third of 
their graduating class. The results clearly 
indicated that the reports of the freshmen 
showed little correspondence with their actual 
standing in high school. Paterson and Thorn- 
burg concluded that such self-reports should 
not be depended on as being valid. 

The report on the accuracy of work histories 
and on the reliability of pollsters’ reports of 
age and education of respondents would lead 
us to place great confidence in facts obtained 
from persons. However, the studies of Krue- 
ger and of Paterson and Thornburg indicate 
that distortion does occur. These latter two 
studies were both performed in settings in 
which there may have been more incentive 
toward distortion than in the other studies. 


The Present Study 


This study was performed in conjunction 
with the administration of an exploratory 
form of an engineering analogies test. This 
test was given to 203 seniors in the Institute 
of Technology at the University of Minnesota. 
Each senior was asked to indicate, to the best 
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of his ability, his honor point average! based on 
all the courses he had taken as an under- 
graduate. The specitic directions given to the 
group were as follows: “‘Now somewhere on 
your answer sheet, please put down your best 
guess of your total honor point average. It 
doesn’t have to be correct to the nearest hun- 
dredth, but do the best you can.” Previous 
instructions had indicated that the test to be 
given was exploratory in nature and that the 
results would not be used against the student 
in any way. This carried with it the implica- 
tion that the honor point averages were needed 
only for purposes of test validation. Thus, 
the actual recording of the estimated averages 
was probably free of any external reward or 
punishment involvement. Of the 203 stu- 
dents, only six failed to give this information. 
The true honor point averages were then ob- 
tained from the files of the College of Engi- 
neering. These were compared with the 
students’ estimates by computing the product- 
moment correlation and by analyzing the data 
by means of a scatter diagram. 


Results 


Table 1 shows the relationship between the 
reported honor point averages and the actual 
honor point averages. The Pearson correla- 
tion coefficient between these two measures 
is +.94. Thus it may be concluded that the 
seniors were well aware of their true overall 
average. The correlation is higher than that 
which would have been obtained, had the 
students relied merely on their previous 
quarter’s grades for the formulation of their 
estimates. However, we may yet ask what 
the degree of accuracy of the reported averages 
is. The high degree of association does not 
insure high correspondence between the ab- 
solute values of the two measures. It guaran- 
tees only a constancy of rank placement of 
one with respect to the other. Thus, it would 
be possible for the distribution of reported 
averages to be displaced constantly upward 
or downward with respect to the distribution 
of true averages. That this did indeed occur 
within a subgroup of the seniors is clearly 

1 The honor point average is calculated on the basis 


of three honor points for each credit of A, two for B, 
one for C and zero for either D or F. 


Table 1 


Relationship between Reported and True Honor Point 
Averages for 197 Engineering Seniors 
Note: Pearson r for ungrouped data = +.94 


2 ' True HPA 
eportec GP 

| Total 
HPA 0.2- 0.6- 1.0 57 


| 0.59 0.99 


14- 1.8- 


2 O- | 
1.39 1.79 2.19 2.59 99 | 


2.6-2.99 | 5 
2.2-2.59 

1.8-2.19 

1.41.79 | 

1.0-1.39 | 

0.6-0.99 

0.2-0.59 | 


Total 





evident in Table 1. The students whose ac- 
tual averages are below 1.00 tend, as a group, 
to suppress this fact. Thus, of 62 students 
whose true averages were below 1.00, only 30 
reported their averages as such. This dis- 
tortion did not occur at the other end of the 
distribution. Thus, of 28 students who re- 
ported averages above 2.00, 27 actually did 
have such averages. 

In view of the accuracy with which the rest 
of the group reported, it may be concluded 
that the distortion within the lower group is 
a reflection of a tendency toward conscious 
or unconscious falsification. Since a “C” 
average is required for graduation, this tend- 
ency toward overestimation may be caused 
by the student’s desire to be on the “‘safe side.”’ 
It is interesting, however, to speculate on the 
degree to which this result may indicate a 
tendency on the part of the low ability student 
to fool himself through a rationalization of 
his poor grades. At any rate, the results 
within this subgroup are surprising when it is 
remembered that the directions were such as 
to minimize the incentive toward falsification. 

In contrasting this study with previous ones 
reported, the results bear out the high accuracy 
of work history data reported by Keating, 
Paterson, and Stone. The marked distortions 
reported in the study by Paterson and Thorn- 
burg were probably due in, part, to the fact 
that students were asked to indicate whether 
they fell in the top, middle, or bottom third of 
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their graduating classes. A student, even 
though he knows his scholastic average very 
well, may yet lack the specific information 
which would allow him to indicate accurately 
his percentile standing in the class. Thus, 
a student believing himself to be near the next 
higher group will, in the absence of normative 
data, place himself in that group. A large 
amount of distortion in the early study may 
have been due to such a lack of normative 
information. 

The present study indicates quite con- 
clusively that most graduating seniors know 
their total honor point average with a high 
degree of accuracy and so report it. However, 
the distortion which occurred within the group 
characterized by low academic standing points 
to the fact that the accuracy of such self-re- 
ports differs according to the magnitude of 
the honor point average. Persons interested 
in such scholastic information would do well 


perhaps to place full reliance only in those 
reports which indicate marked academic super- 
iority. For it is within this range that stu- 
dents tend to know their average most pre- 
cisely, and their feeling of pride concerning 
their achievements evidently precludes the 
occurrence of distortion in either direction. 


Received A pril 30, 1951. 
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Validity of Minnesota Occupational Rating Scales * 
Harold Geist 


Stanford University 


Many psychologists have been interested 
in relating test scores to occupational require- 
ments. The Minnesota Employment Stabi- 
lization Research Institute pioneered with a 
standard battery of tests. Profiles were made 
for various occupational groups on intelligence, 
clerical, mechanical, spatial and manual dex- 
terity tests (2). The Occupational Analysis 
Division of USES carried this technique still 
further. For selection purposes, batteries 
of the most valid tests which had been de- 
veloped were used in varying combinations 
to establish profiles for each job studied. For 
guidance purposes, a standard battery was 
administered to persons employed in various 
families of occupations and patterns of apti- 
tudes were ascertained (3). Occupations were 
classified according to occupational families 
in such a manner as to make some two hundred 
profiles represent approximately two thousand 
major occupations. Profiles were based on 
critical minimum scores rather than mean 
scores. ‘Tests in the battery were selected on 
the basis of factor analysis studies of vocational 
aptitudes involved in occupations in various 
key parts of the country rather than in one or 
two localities. 

The limitations on the small number of 
occupations in the Minnesota Employment 
Stablization Research Study was partially 
remedied by the publication of the Minnesota 
Occupational Rating Scales in 1941 (4). 

The 1941 Minnesota Occupational Raling 
Scales contain a list of four hundred and thirty 


* The author wishes to express his indebtedness to 
Dr. George Barahal, Associate Professor of Education 
and Psychology, Wayne University, and formerly 
Director of the Counseling and Testing Center, Stan- 
ford University, and to Miss Patricia James, Psychom- 
etrist, Stanford University Counseling and Testing 
Center, for permitting him to examine the counseling 
records while employed there. He also wishes to ex- 
press appreciation to Dr. H. B. McDaniel, Professor of 
Education and Psycholégy at Stanford University, for 
reviewing the manuscript and offering valuable sugges- 
tions. The author is at present serving as Chief 
Clinical Psychologist, Mare Island Naval Hospital, 
Mare Island, California. 
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occupations, each classified according to mini- 
mum requirements with respect to six abilities: 
academic, mechanical, social intelligence, cleri- 
cal, musical talent, and artistic. The defini- 
tions of these abilities as stated in the Manual 
(4) are: 


1. Academic Ability. By academic ability 
is meant the ability to understand and manage 
ideas and symbols. 

2. Mechanical Ability. Mechanical ability 
includes both the ability to manipulate con- 
crete objects, to work with tools and machinery 
and the materials of the physical world, and 
the ability to deal mentally with mechanical 
movements. 

3. Social Intelligence. By social intelligence 
is meant the ability to understand and manage 
people, to act wisely in human relations. 

4. Clerical Ability. By clerical ability is 
meant the ability to do rapidly and accurately 
detail work such as checking, measuring, classi- 
fying, computing, recording, proof-reading, and 
similar activities. 

5. Musical Talent. Musical talent requires 
the capacity to sense sounds, to image these 
sounds in reproductive and creative imagina- 
tion, to be aroused by them emotionally, to be 
capable of sustained thinking in terms of these 
experiences, and ordinarily the ability to give 
some form of expression in musical perform- 
ance or in creative music. 

6. Artistic Ability. Artistic ability refers 
both to the capacity to create forms of artistic 
merit and the capacity to recognize the com- 
parative merits of forms created. 


A review of the literature reveals that no 
validation has been made of the Minnesota 
Scales. The authors of the Scales modestly 
claim that “these judgments are believed to 
yield information of value in our struggle to 
understand occupational requirements in terms 
of human abilities.” It is because these 
Scales are of the “armchair” analysis type 
that some counselors are hesitant to use them 
as valid psychometric aids. 


Present Study 


The present study was undertaken to de- 
termine to what extent the profiles of the 
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Rating Scales agree with the test protiles of 
a selected group of counselees. An arbitrary 
grouping of the, four hundred and thirty oc- 
cupations in the Scales was made according 
to the groupings of the Dictionary of Occupa- 
tional Titles, i.e., each occupation was placed 
in a category of the DOT. Then the letter 
grade of each ability group was converted into 
a mean number grade for that category. For 
example, the first occupation in the Rating 
Scales is Accountant which has the following 
rating for the ability grouping: (a) Academic, 
A; (b) Mechanical, D; (c) Clerical, A; and 
(d) Artistic, D. 

For reasons to be stated later, only academic, 
mechanical, clerical, and artistic abilities were 
chosen. The rating of each of the ability 
groups was taken as the midpoint of the range 
of that letter grade. For example, in the 
Accountant occupation, A was considered to 
be 96.5 percentile, since the range as defined 
by the authors was 93-100 and D for mech- 
anical ability was taken as 12.5 percentile 
since the range was 1-25. The clerical and 
artistic ratings were similarly treated. 

The following arbitrary ratings were taken 
for each ability group, as defined by the authors 
of the Scales: 


A=96.5 percentile for academic, mechan- 
ical and clerical abilities, and 98.5 for artistic 
ability. 

B=83.5 percentile for academic, mechan- 
ical and clerical abilities, and 93.5 for artistic 
ability. 

C=50.5 percentile for academic, mechan- 
ical and clerical abilities, and 58.5 for artistic 
ability. 

D=12.5 percentile for academic, mechan- 
ical and clerical abilities, and 12.5 for artistic 
ability. 


A total of 150 counseling records were selec- 
ted at random from the files of the Stanford 
University Guidance Center. The vocational 
objectives stated on these records were grouped 
into DOT categories. One of the essential 
criteria of each record selected was that it 
contain raw scores on each of the following 
tests: AGCT (Civilian Form); Bennett Mech- 
anical Comprehension Test (BB); Minnesota 
Clerical Test; and Meier-Graves Art Test. 


In addition, as many records as possible 
were obtained which also contained the score 
on the Minnesota Spatial Relations Test. — 

The following are the type of tests which the 
authors of the Minnesota Occupational Rating 
Scales state have been used in securing data 
for each of their categories: academic ability, 
tests of intelligence and academic achievement; 
mechanical ability, mechanical ability tests; 
clerical ability, clerical aptitude tests; and 
artistic ability, tests of art talent and art 
judgment. 

It will be observed that two of the ability 
groupings, social intelligence and musical 
ability, have been omitted. It was felt that, 
although the authors of the Scales state that 
personality and interest tests were used to 
derive their ratings, the present state of such 
tests in the determination of “‘social intelli- 
gence” is so highly questionable that they were 
not used. Tests of musical talent were given 
at the Stanford Guidance Center to such a 
small number of counselees that this ability 
category was necessarily ignored. 


Reasons for Choice of Tests in 
Ability Categories 


The choice of tests to compare with the re- 
sults of the Minnesota Occupational Rating 
Scales was a difficult problem. The authors 
of the Scales did not state which tests were 


used in their final summation. Tests were 
chosen which seemed to fit most nearly the 
description of the authors and at the same time 
were adequate for the population sampled. 
The principal problem in comparing the two 
sets of data was that of securing tests whose 
norms most nearly resembled or agreed with 
the vague “general population’’ norms of the 
Scales. Consequently, those tests were chosen 
upon which the scores most nearly represented 
the “general population.”” Tests finally selec- 
ted for the individual categories were: 

1. Academic Ability. Civilian Edition of 
the AGCT. The AGCT Test (Civilian Edi- 
tion) was chosen since it is a test of general 
learning ability and also is considered to be 
one of the better group intelligence tests. The 
correlation of the AGCT with other well known 
tests of general learning ability is high (Army 
Alpha = .83; Otis = .79). It also seemed to 
meet the specification of standardization on 
what closely approximates a general population. 
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2. Mechanical Ability. The Bennett (BB 
Form) was chosen as the best: criterion for 
mechanical ability. According to the Manual 
(1) for this test, it is ‘designed to measure the 
capacity of an individual to understand various 
types of physical and mechanical relation- 
ships.”” The norms used were those of people 
in “light mechanical work."’ This group of 
subjects had an average age of 22.7 years with 
a standard deviation of 4.8 years, a mean edu- 
cation of 12.2 years with a standard deviation 
of 1.0 years. So far as is known to this author, 
there is no test of mechanical ability which has 
“general population’’ norms meeting the re- 
quirements of the specifications of the authors 
of the Minnesota Scales. 

3. Clerical. The Minnesota Clerical Test 
was chosen because it is an aptitude test meas- 
uring clerical speed and accuracy and is one of 
the few clerical tests which has norms on gain- 
fully occupied adults. 

4. Art. The choice of an adequate art test 
was an extremely difficult one. After survey- 
ing several art judgment tests at the Stanford 
Guidance Center, it was decided to use the 
Meier-Graves Art Test. According to the 
author of this test, its purpose is to measure 
six interlinked traits or abilities, manual skill, 
volitional perseveration, aesthetic judgment or 
intelligence, perceptual facility, creative imagi- 
nation, and aesthetic judgment. This quali- 
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tative description far exceeds the requirements 
of the Minnesota Scales. Unfortunately the 
norms are on college adult art students and 
not general population norms. 


Results 


For each of the nine categories of the DOT, 
the profiles of the Minnesota Scales were 
plotted against the corresponding mean empir- 
ical test scores converted to percentiles. 
Differences were plotted in terms of plus or 
minus between the Minnesota Scales and the 
mean test scores. Figures 1, 2, and 3 furnish 
the data for the analyses which follow: 

Professional. In the professional category 
the profiles are fairly similar. The chief dis- 
crepancies seem to be in the academic ability 
categories and in the artistic category. It 
appears rather surprising that the mean per- 
centile scores of the experimental group is 
below the estimate of the Minnesota Scales. 
It is probable that the Minnesota Scales are 
more nearly correct since the majority of pro- 
fessional workers in Stewart’s study (5) are 
1.5 sigma above the mean, which would place 
them at about the ninetieth percentile. 
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Managerial. It is in the managerial group 
that the similarity between the test scores of 
the experimental group and the Minnesota 
Scales is most striking. The profiles are almost 
identical. The academic ability test scores of 
the experimental records is higher than the 
estimate of the Minnesota Scales. The reason 
again is probably because of the population 
sampled. 

Sales. In the sales group the resemblance 
between the two profiles, although not quite as 
. similar as that of the managerial, are almost 
identical. It is probable that the discrepancy 
between the artistic ability groupings is due to 
the large number of salesmen of artistic equip- 
ment and implements included in the experi- 
mental records. 

Services (Protective). In this group, the most 
striking difference is in the clerical grouping. 
The protective service category of the DOT 
consists mainly of police and allied personnel. 
In the higher “bracket” positions, it is doubt- 
ful that a high clerical ability (62nd percen- 
tile, Minnesota Scales) is needed for this oc- 
cupation, although it is conceivable that in 


Protective 


Acad. Mech. Cler Art. 


Personal Service 
Acad. Mech Cler Art. 
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such a position as desk sergeant, a high order 
of clerical ability is desired. An overall view 
of this group would seem to indicate that the 
experimental group would more nearly rep- 
resent the clerical ability level required of 
people in protective services. 

Services (Personal). Here there is the great- 
est discrepancy between the Scales and the 
experimental group. It is probable that in a 
group where the skills required are of a rather 
low order, any attempt at prediction is at 
best nebulous. Consequently, it is felt that 
the true profile probably lies between the ex- 
perimental group and the Minnesota Scales. 

Services (Domestic). Since no domestic ob- 
jectives were chosen, this category was omit- 
ted. This was likewise true of Building 
Service Workers and Porters. 

Agriculture. Since the experimental group 
only included objectives in the agricultural 
and horticultural occupations, the fishery, 
forestry, hunting and trapping occupations 
were omitted. It will be observed that the 
shapes of the profiles of the experimental and 
Minnesota Scales are similar, but the discre- 
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Comparison of Minnesota Scales with Empirical Test Scores of 


Semiskilled, Skilled, and Clerical. 


pancies are large. There are probably two 
reasons for this. The first is the difference in 
the sampling grouping. The second is that 
the Minnesota Scales were published approxi- 
mately ten years ago, and the Scales were de- 
vised some time before that. The greater 
skill necessary for agricultural workers today 
because of the mechantzation of agriculture 
has necessitated farmers in general to possess 
greater skills in the ability groups sampled, 
especially in the mechanical area, where the 
discrepancy is the greatest. 

Skilled and Semiskilled. The abbreviated 
definition of the skilled and semiskilled cate- 
gories according to Shartle (6) are: Skilled, 
“Includes craft and manual occupations that 
require predominantly a thorough and compre- 
hensive knowledge of processes involved in the 
work, the exercise of considerable independent 
judgment, usually a high degree of manual 
dexterity and in some instances extensive re- 
sponsibility for valuable products or equip- 
ment”; and Semiskilled, “The exercise of 
manipulative ability of a high order, but 
limited to a fairly well defined work routine. 


These occupations may require the perfor- 
mance of part of a craft or skilled occupation, 
but usually to a limited extent.” 

It was felt that the addition of a motor 
dexterity criterion was necessary to complete 
this category. Consequently, all cases in 
this category had in addition to their other 
tests, the Minnesota Spatial Relations Test 
results. Since the Minnesota Spatial Rela- 
tions Test norms are in percentile ratings corre- 
sponding quite closely with the Minnesota 
Occupational Rating Scales, the results fitted 
in quite closely with the other test results. 

The most outstanding difference between 
the two profiles is the discrepancy in the 
academic ability category. It appears that 
the estimate of the Minnesoia Scales is low, 
while the experimental group is high, the true 
points probably lying somewhere between. 
In the theoretical mechanical group, the re- 
sults of the experimental group and the Minne- 
sola Scales are identical. The experimental 
group reached the sixty ninth percentile on 
the motor dexterity test. The results of this 
test could not be integrated with the mech- 





28 (C; Harold Geist 


anical comprehension tests, for although they 
are lumped together by the Minnesota Scales 
they tap essentially different abilities. 

Skilled. Yn the skilled category, one item 
is outstanding. The academic and mechanical 
ability (both theoretical and motor) groupings 
of both the Minnesota Scales, and the experi- 
mental group are practically identical. The 
large discrepancies in the artistic and clerical 
categories might be explained by the difference 
in sampling. 

Clerical. The clerical category shows a 
distinctly higher rating on the Minnesota 
Scales in the academic ability grouping. Since 
the majority of those in the clerical category 
in Stewart’s study (5) lie one sigma above the 
mean, the Minnesota Scales appear to be more 
nearly correct. A tendency to underestimate 
the intellectual requirements of clerical person- 
nel by counselors in a college guidance center, 
may affect the sample. The percentile ratings 
of the clerical grouping are identical or practi- 
cally identical as would be expected in this 
category. 

Summary 


1. The 430 occupations of the Minnesota 
Occupational Rating Scales were divided into 
nine categories of the Dictionary of Occupa- 
tional Titles. 

2. Each of the categories was subdivided 
into four of the six ability groupings of the 
Scales. 

3. The mean score of each ability group for 
all categories was computed and profiles made 
for each of the categories. 


4. An experimental group from the Stanford 
University Guidance Center was selected and 
the objectives were likewise divided into the 
nine categories of the Dictionary of Occupa- 
tional Titles. 

5. Test results were obtained on those ability 
groupings chosen for the Scales. These test 
results were selected on the basis of the tests 
which most nearly represented the definition 
of tests by the authors of the Scales. 

6. Test results of the experimental group 
were compared with the finding of the Scales 
by profiles. 

7. The test profiles of the experimental 
group agreed quite closely with those of the 
Scales. 

8. Further research in this field is indicated. 


Received March 5, 1951. 
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Lecture vs. Group Decision in Changing Behavior * 


Jacob Levine 
V. A. Hospital, Newington, Conn 


and 
John Butler 


Trinity College, Hartford, Conn. 


To the industrial or group leader who is 
seeking to change the behavior or attitudes of 
people, psychology has had little to offer in the 
way of practical techniques or guiding prin- 
ciples. One of the few important contribu- 
tions to this problem was that of Lewin (1) 
when he compared the relative effectiveness 
of group decision with formal lectures in in- 
fluencing a group of women to change their 
eating habits during the war. His findings 
indicated that group decision was the more 
effective method. However, Lewin recognized 
that his results may have been due to a differ- 
ence in expectation between the two groups, 
for the group decision group had been told that 
a later inquiry would be made as to whether or 
not the members had carried out the. suggested 
changes. No such information was given to 
the Lecture group. This forewarning of a 
subsequent checkup may have had seme in- 
fluence on the decision of the first group to 
change with a consequent bias in the result. 

Studies on prejudice and social attitudes 
have demonstrated that education in itself 
does not reduce prejudice nor change attitudes 
significantly. See Samelson (4). The com- 
plex relationship between learning, perception, 
and motivation is no more dramatically illus- 
trated than here, where learning and correct 
perception can occur without leading to signi- 
ficant action. Were the acquisition of knowl- 
edge alone sufficient to lead to behavior 
change, many individuals would not be re- 
peating again and again the same personally 
and socially disastrous behavior patterns 
though they well know that different behavior 


* Submitted with the approval of the Chief Medical 
Director, Department of Medicine and Surgery, Vet- 
erans Administration, who assumes no responsibility for 
the opinions expressed in this paper which are those of 
the authors. 


would lead to more successful social relations. 
Within this problem lie hidden some of the 


most crucial problems of human adjustment ” 


and learning. 

Though we recognize the importance of moti- 
vation as well as the acquisition of knowledge 
in social change, it is often difficult to deter- 
mine the changing motivational factors in the 
specific situation. This cannot be omitted 
from the understanding of such problems as to 
why group decision is more effective as a be- 
havior modifier than is the formal lecture. 
One can talk about greater ego involvement 
in the one case but just how this is related to 
motivation and to action is far from clear. As 
Lewin has pointed out, a higher degree of ego 
involvement does not necessarily lead to a 
decision to act. He suggested that perhaps 
in group decision the members are more likely 
‘to make up their minds” or reach a decision. 
And though the making of a decision takes 
but a minute or two, once it is made, “‘it has 
an effect of freezing this motivational con- 
stellation for action.” But this explanation 
does not tell us how it is that the individual 
makes a decision to act more readily in the 
group decision than he does in the lecture. 
Though in each case, the translation of de- 
cision into action was ultimately made by the 
individual, it would seem that the step from 
the absorption of basic information to the 
making of a decision was more of a group pro- 
cess in the group decision method than it was 


‘in the lecture method. 


The present experiment was designed to 
repeat that of Lewin in a different setting 
under carefully controlled conditions of in- 
formation given and behavior changes meas- 
ured. In this study, group decision is com- 
pared with formal lecture as a method of 
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producing changes in socially undesirable 
behavior. Both methods are then compared 
with one in which no attempt is made to bring 
about any change. Thus, the experiment was 
designed to answer two questions: 1. Is the 
acquisition of knowledge enough to lead a 
group of individuals to change a socially un- 
desirable behavior pattern? 2. Is group de- 
cision a more effective method of producing a 
change in behavior than is the formal lecture? 


The Experiment 


The subjects consisted of 29 supervisors of 
395 workers in a large manufacturing plant. 
The workers were on an hourly rate. These 
factory workers represented a wide variety of 
jobs and skills, ranging from unskilled manual 
labor to the most highly skilled machinist and 
toolmakers. All of these jobs were classified 
into nine different grades on the basis of skill] 
and training required. 

Within each job grade three different hourly 
wage rates prevailed. The particular rate 
paid to any worker was determined in large 
part by the quality of his performance on the 
job. Performance was evaluated by one of 
the 29 foremen who supervised the work of 
these 395 men. Every 6 months each worker 
was rated by his foreman on established rating 
scales for 5 factors: 1. Accuracy; 2. Effec- 
tive use of working time; 3. Output; 4. 
Application of job knowledge; and 5. Co- 
operation. The sum of the scores on each of 
these five scales comprised a worker’s total 
performance rating and determined what wage 
rate he would get. 

Unfortunately, the results of this rating 
system were not equal to expectations. The 
foremen, in executing their ratings, tended to 
overrate those working in the higher job grades 
and to underrate those in the lower grades. 
This positive and negative ‘halo effect” re- 
sulted in the workers in the lower grades of 
jobs receiving the lowest of their respective 
wage rates while the more highly skilled 
workers consistently received the highest of 
their respective wage rates. Evidently the 
foremen were not rating performance of the 
individual worker but the grade of the job 
as well. 


The problem was set up to determine the 


most effective method of getting these super- 
visors to change the basis for the ratings so 
that a more equitable rating system would 
prevail. Our objective was to help these 
supervisors see that their task in rating each 
worker was to consider only how well he did 
his job and not how difficult the job was. He 
was to understand that he was to rate the man 
and not the job. The task of the present ex- 
periment was to determine which was the more 
effective method of achieving this change in 
behavior of the 29 rating supervisors, group 
decision or the formal lecture. 


Experimental Procedure 


The 29 supervisors were randomly divided 
into three groups of 9, 9, and 11. It may be 
pointed out that all supervisors were experi- 
enced raters and had been rating employees 
for a number of years. The first group, Group 
A, consisting of 9 supervisors of 120 workers, 
served as a control group, and received no 
special instructions prior to rating. The 
second group, Group B, consisted of 9 super- 
visors of 123 men and served as the discussion 
group. The third group, Group C, consisted 
of 11 supervisors of 152 men, and served as 
the lecture group. 

Several days prior to rating, the members 
of Group B were gathered together around a 
table with the discussion leader. The leader 
did not sit at the head of the table nor did he 
lead the discussion. He introduced the pro- 
blem by showing a graph of the previous rat- 
ing and raised the question why it was that 
the high skilled workers were consistently 
rated higher in performance than the low 
skilled. From that point on, the leader merely 
acted as moderator and avoided injecting him- 
self into the discussion. All decisions and 
opinions were made solely by the group mem- 
bers. The discussion lasted one hour and a 
half. The group expressed a number of ideas 
and arrived at several conclusions. Théy 
finally reached one decision acceptable to‘the 
group: The way to avoid the inequalities in 
rating was to disregard the difficulty of the 
jobs and rate only the man doing the job. 
Consideration was to be given only to how 
well a worker was doing his job. All 9 mem- 
bers agreed on this decision, 
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Group C, the Lecture group gathered in a 
formal lecture room and all sat facing the 
leader. They were given a detailed lecture 
on the technique and theory of employee per- 
formance rating. Some background material 
on wage administration and job evaluation 
was also included. The lecture carefully 
pointed out the errors of their previous ratings 
and interpreted the reasons for their occur- 
rence. He illustrated his lecture with graphs 
and figures. He finally explained what each 
rater was supposed to do: that he was to rate 
individual performance and not difficulty of 
the job. After the lecture, questions were 
encouraged and asked by the raters; complete 
answers were given. The total session lasted 
about one hour and a half. 


Experimental Results 


Table 1 presents the results of the super- 
visors’ ratings according to labor grade. For 
comparison, the previous ratings by the same 
supervisors are included. In the pretraining 
rating we see the gradual decrease in mean 
rating as we go down in the labor grade. This 
“halo effect” characterizes each of the three 
experimental groups. In the _post-training 
rating some changes are observable. 

For the sake of simplification of the results, 
the nine labor grades were arbitrarily divided 
into two categories: Low Group and High 
Group: the first four labor grades were placed 
in the High and the last five in the Low. 
When we compare the mean ratings of the Low 


and the High labor groups prior to training we 
find that the difference is significant at the 1% 
level of confidence for two groups and at the 
7% level for the third. In each case the size 
of the difference is at least one-third of a rat- 
ing unit in a total of three units. 

In the second rating, only Group B shows 
any significant change in difference in the mean 
ratings of Low and High Grades. For Group 
A, the control group, the mean difference re- 
mains almost the same, and is still significant 
to the 1% level of confidence. Group C, the 
lecture group, shows a small decrease in the 
difference, but it is still significant to the 19% 
level. 

We may conclude that performance ratings 
were significantly affected only after the raters 
had had a group discussion and had reached a 
group decision. Neither increased experience 
in rating nor the learning about their previous 
errors in rating had any significant influence. 
Our findings completely confirm those of Lewin 
in demonstrating the greater effectiveness of 
group decision over the lecture method of 
training. 

The relationships between mean ratings for 
pre- and posttraining are shown graphically in 
Figures 1, 2, and 3. For Group A, the two 
curves are seen to be essentially the same. All 
pretraining curves slope downward from high 
to low ratings more or less similarly. In the 
post-training curves it is only the Group B, 
which shows a flattening or equalization of 
mean ratings for the 9 labor grades. It is 


Table 1 


Mean Rating Differences between Low and High Labor Groups Before and After Training Sessions 





Group A 
Control 


Group B Group C 
Group Degision Lecture 





Mean Rating 


2nd 


Mean Diff. 
Signif.* 
(p) 


Mean Rating 


Mean Rating 
———— - Signif.* 
ist 2nd (p) 


1.8 1.7 .23 
2.4 2.2 .23 
63 45 
01 1 





* This is the probability that a difference this great or greater could have arisen simply through errors of 


tandom sampling. 
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AVERAGE RATING 


3 4 5 
LABOR GRADE 


Kic. 1. Average rating of raters with no 


training sessions (Group A). 


nN 


AVERAGE RATING 


3 4 5 
LABOR GRADE 


Average rating of raters before and after 
group decision session (Group B). 


nN 


AVERAGE RATING 


4 5 
LABOR GRADE 


Average rating of raters before and after 
lecture session (Group C). 


interesting to note that of the three groups, 
Group B had shown the greatest difference 
between high and low labor grades prior to 
training. The post-training ratings of Group 
C, the lecture group, show a consistent lower- 
ing all along the curve from the previous rat- 


ings. This reduction might be the result of 


an increased conservatism in rating by these 
raters as a consequence of the lecture, without 
affecting their basically prejudiced ratings. 


Discussion 

It is clear that group decision was more 
effective in reducing the prejudiced ratings of 
these factory supervisors than was the formal 
lecture. This in itself is a significant finding. 
But what seems to be even more striking is the 
fact that the lecture method had practically 
no influence upon the discrepancies in rating. 
It is generally assumed that once an individual 
or a group of individuals learn that they have 
been behaving in a socially undesirable way, 
they will immediately take steps to change, 
particularly if is clear to these indiveduals 
that it is their responsibility to eliminate such 
errors. Our findings do not support such a 
notion. The acquisition of knowledge does 
not automatically lead to action. 

The findings also indicate that once a group 
arrives at a decision to act, the members, even 
though they may act as individuals, take on 
that decision and act in accordance with it. 
The force of this group decision was evidently 
sufficient to overcome the resistance to change 
in habitual ways of thinking and acting. How 
these group forces were able to operate upon 
the individual the present study does not re- 
veal. Further research is necessary to de- 
termine whether or not group decision leads 
to a “freezing of decision to act’? whereas the 
lecture method does not. 


Summary 


A formal lecture method was compared 
with group decision in inducing 29 supervisors 
of 395 factory workers to overcome their 
biased performance ratings. The results 
showed that only the group of supervisors 
involved in group decision improved in their 
ratings. The lecture group did not change 
and persisted in overrating the more highly 
skilled workers and underrating the less skilled. 
The conclusion was drawn that group decision 
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Negro-White Army Test Scores and Last School Grade * 


Byron E. Fulk and Thomas W. Harrell 
University of Illinois 


This study was undertaken in order to com- 
pare the performance of Negroes and Whites 
on the Army General Classification Test 
(AGCT) in World War II. AGCT scores 
used in the study were obtained from Manning 
and Informational Rosters of various organi- 
zations of the Army Air Force in World War II. 
These organizations were part of the Air Force 
Service Command. Included were such 
organizations as Headquarters Squadrons, 
Service Squadrons, Chemical Sections, Signal 
Companies, Quartermaster Companies and 
other organizations concerned with air base 
activities other than actual flying. 

A White sample of 2,174 scores is compared 
with 2,010 Negro scores. The two samples 
are compared in terms of the means, the 
standard deviations, and the per cent of over- 
lap. The groups have been sub-divided in 
terms of school grade completed and com- 


* Fulk carried out the study reported herein as his 
Master’s thesis under the supervision of Harrell. 


parisons made at each level. The results of 
the comparisons are shown in.Table 1 and 
Figure 1. 

Mean scores of the Whites exceed those of 
the Negroes at each grade level. All of the 
differences are statistically significant. The 
lowest critical ratio for a difference was 9.1. 
The percentage of Negroes whose scores ex- 
ceed the median score for the Whites as shown 
in Table 1 is 12 for the entire population 
studied. By years of school completed over- 
lap varies from 17 per cent at grade 12 to two 
per cent at grade five. Overlap is higher at 
the higher grades, beginning with school grade 
ten than it is at lower school grades. 

It is not suggested that because two individ- 
uals have attended school for an equivalent 
period of time that the factor of schooling is 
thus controlled. This method, however, of 
keeping last school grade constant certainly 
can be expected to cancel out some of the 


Table 1 


Mean AGCT Scores of Whites 


and Negroes by Years of Schooli 


ng 





Years Number of Cases Mean AGCT Score 


White 
242 
73 


Negro 
281 
51 
63 
127 
151 
139 
191 
174 
202 
110 

140° 
119 
172 
90 


of 
Schooling 
0 


1 
2 
3 
4 
Sa 
6 


White 
82.4 
91.2 
88.4 
91.2 
90.6 
90.4 
88.0 
85.4 
94.5 

100.7 
102.5 
108.0 
109.2 
119.5 


Negro 
59.4 
58.4 
57.8 
57.6 
59.8 
54.6 
59.6 
64.4 
69.2 
73.4 
79.0 
86.0 
93.0 
97.5 


13 and up 


Total 2,174 =2,010 95.1 68.5 


Per Cent 
of 
Overlap* 
4 
10 
9 
6 
8 
2 
6 
9 


Difference 
Between 
Means 
23.0 
32.8 
30.6 
33.6 
30.8 
35.8 
28.4 
21.0 
25.3 
27.3 
23.5 
22.0 
16.2 
22.0 


Standard Deviation 


White 
16.5 
16.2 
20.5 
23.2 
23.8 
25.2 
21.3 
15.2 
15.4 
15.2 
14.2 
14.4 
15.4 
14.8 


Negro 


13.3 
19.6 
18.6 
18.1 
20.0 
11.6 
14.0 
14.7 
17.3 
16.6 
17.0 
16.2 
17.8 
16.7 


6 


21.2 20.7 26.6 








* The percentage of Negroes whose scores exceed the median White score. 
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differences which are usually attributed to 
educational background. 

The rosters which provided the data con- 
tained no indication of the soldier’s place of 
birth or home address, consequently no in- 
formation concerning possible differences due 
to regional origin or quality of schooling can 
be derived. 


Summary 
The performance of Negroes on the AGCT 


has been compared with that of Whites. The 
mean of the White scores was found to exceed 


Years of schooling and mean AGCT score. 


that of the Negro scores. The difference is 
significant. Significant differences favoring 
Whites were discovered at each last school 
grade, but these differences were less at the 
higher school grades beginning with grade ten. 
Performances of both groups on the AGCT 
were found to be directly related to amount of 
schooling accomplished by the time of in- 
duction into the military service. This re- 
lationship began after school grade five for 
Negroes and after grade seven for Whites. 
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Measuring Educational Leadership Attitudes * 


J. J. Valenti 


State Teachers College, : 


The problem was concerned with the de- 
velopment and evaluation of an instrument to 
measure the attitudes with which teachers and 
administrators view various problem areas 
pertaining to the social role of the teacher. 

By the time they have entered into active 
teaching roles, teachers and principals have 
formulated, consciously or unconsciously, 
certain ‘“‘philosophies of education”—values 
or attitudes—which they use as a frame of 
reference in observing various aspects of their 
teaching jobs. The extent and degree to which 
the attitudes or ‘“‘philosophies”’ of teachers and 
principles within a given school are in agree- 
ment has an important bearing on the nature 
of their personal relationship and their rela- 
tionships with others in the school situation. 
This study, then, is more concerned with in- 
formal aspects of interpersonal relations rather 
than the formal aspects. Because of the local 
structure of our public schools, it would also 
be of interest to consider the sentiments of 
other persons in the educational situation 
parents, community, other employees, stu- 
dents. This study, however, was restricted 
to the attitudes of teachers and administrators. 

The attitude evaluated in this study is the 
teacher’s frame of reference in his teaching 
role or the point of view he develops towards 
various aspects of his teaching situation. This 
point of view might be better called his 
“definition of the situation” or his “disposition 
toward certain styles of behavior.”” In this 
study an instrument was devised which at- 
tempted to ‘‘measure” the various styles 
around which the teachers in a given school 
define their situations. From the results of 
an administration of such an instrument, it 


—%* This report was presented at the March, 1951 
meeting of the Minnesota Society for the Study of 
Education, and is a summary of a Ph.D. thesis, Develop- 
ment and Evaluation of a Leadership Attitude Scale 
Around the Social Role of the Teacher, on file in the 
University of Chicago Library. The writer expresses 
sincere appreciation to Professors W. M. Shanner, 
E. C. Hughes, and F. S. Chase, who served as his 
advisers in the study. 
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was hoped that an in-service training program 
could be drawn up to modify attitudes toward 
desired ones or at least to recognize existing 
ones which may create sources of friction. 


Conceptual Framework 


The conceptual framework for the attitude- 
measuring instrument used in this study and 
in a previous study (6) of the International 
Harvester Research Project of the University 
of Chicago,' was developed from a survey of 
the literature of leadership, communication, 
and supervision. In the literature it appeared 
that there were two general approaches to the 
study of interpersonal relations. The first 
group of studies seemed to be formal-individ- 
ualistic approaches since they mainly con- 
sidered leader-follower problems in terms of 
qualities, skills, knowledge, and personalities 
of the individual leaders. In this group are 
represented various studies dealing with: (a) 
eminent men and heroes; (b) physical char- 
acteristics of leaders; (c) traits of leaders; (d) 
job analysis; and (e) supervisory practices in 
American Schools. The second group of 
studies appeared to be concerned with in- 
formal-group centered approaches. The writ- 
ings in this group in many cases have been 
influenced, generally, by the concepts of topo- 
logical psychology or field theory, and par- 
ticularly, by the leadership of Kurt Lewin. 

These influences are evidenced in the liter- 
ature of: (a) sociometry; (b) group dynamics; 
(c) human relations; (d) contemporary general 
administration; and (e) current supervision. 
While both of the above approaches have made 
valuable contributions to problems in inter- 
personal relations, they have neglected the 
force of institutional expectations and pressures 
which condition the roles of official leaders.” 

1 Nelson’s study, one of a series developed under the 
Harvester project, was the first one based on the 
rationale of the leadership continuum and the four 
leadership styles. 


2 The study of leadership has been parallel, if not 
central, to the more general problem of social psy 
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These Chicago research studies have viewed 
the leadership process as mainly one of main- 
taining effective communication, and have tried 
to determine the internalized attitudes of vari- 
ous levels in the organizational hierarchy 
toward the social relationships at the “hard 
core’ of the institution.? The “hard core’’ is 
that point where institutional functions are 
translated into action, where organizations 
focus their attention. Since the role of the 
teacher represents the central position in the 
educational organization concerning which the 
organizational agents (superintendents, school 
board members, principals, teachers, pupils, 
parents) may have varying concepts, it is well 
suited for such an analysis. 

This study has attempted to measure the 
underlying attitudes of teachers and principals 
toward the social role of the teacher along a 
four point continuum. The continuum which is 
a ‘“‘measure”’ of social distance shows the extent 
of interaction or communication between levels 
in the organizational hierarchy as it passes 
through the role of the teacher, and varies from 
an (external) authority-centered approach to 
an (internal) group-centered approach. 

At one extreme the teacher defines his role 
in terms of impersonal, direct, formal, and 
one-way communications in the interaction 
process in order to maintain his own and organi- 
zational status, and depends upon a “‘technical” 
approach. At the other extreme, he uses per- 
sonal, indirect, informal, and two-way commu- 
nications in order to develop the pupil and the 
class and depends upon a ‘‘human” approach. 

Four fairly distinct points are hypothesized 
along the continuum representing four distinct 
styles of behavior. These four styles have 
been called respectively: A. Impersonal Style; 
B. Personal Style; C. Counseling or Develop- 
mental Style; and D. Integrating or Coordi- 
nating Style. These styles are indicative of 
the teacher's and principal's attitudes toward 
communication with other persons in the school 
organization (pupils, other employed personnel, 
principals and teachers, parents, the commu- 
nity, superintendent) in certain areas of contact 
with these people. They reveal, then, the 
teacher's ‘philosophy of education,”’ or the 
way he defines his situation. 


A. The Impersonal Style represents the 
teacher who sees authority and expert opinion 
at the top of the hierarchy of values with 
himself as the representative of that authority 
and all pupils of equal consideration below. 


chology—the relationship between the individual and 
society, or, more particularly, between the individual 
and the group. In both areas there is in process a 
profound shift in methodology and attention from 
individual to social interaction. 

3The term “hard core” has been described in the 
lectures of Prof. E. C. Hughes, Department of Soci- 
ology, University of Chicago. 


He receives a great feeling of security in de- 
pending upon the expert opinion and in follow- 
ing ‘‘the rules and regulations” of his position 
rather closely. He is inclined to be loyal, con- 
forming. The tone of his interaction is formal, 
marked by frequent one-way communications 
and infrequent two-way communications. 

B. The Personal Style represents the teacher 
who is a rugged individualist, technically- 
proficient, a good disciplinarian and hard 
worker. He receives a great deal of satisfac- 
tion from his own creative work, relies mainly 
on his own ability and knowledge. The tone 
of his interaction is less rigid than that of the 
Impersonal Style teacher. He maintains in- 
frequent two-way contacts, but his interaction 
is more personal. 

C. The Counseling or Developmental Style 
adds more to the qualities of the teacher. The 
teacher of this type is interested in social con- 
tact, in developing and guiding his pupils. He 
does this mainly through the use of individual 
incentives—praise, reward, friendliness. He is 
very much concerned about the background of 
each of his pupils so he may “guide’’ them. 
For this reason he is likely to use tests and 
measurements to a great extent. The tone of 
his interaction is much less formal than the 
Impersonal and Personal Styles although his 
methods of counseling are mostly ‘‘directive.”’ 
He shows somewhat of a two-way interaction. 

D. Integrating and Coordinating Style. This 
style of behavior represents the other extreme 
of the continuum—the informal or group ap- 
proach. A teacher with this style tries to 
develop group standards, helps the group to 
express its own opinions. He conceives of his 
participation with the group as being that of 
a “catalytic” agent rather than that of an 
authority. The tone of his interaction is very 
informal with frequent, unstructured (and non- 
directive) two-way communications.‘ 


Methodology 


Since the attitudes of teachers and principals 
were to be studied in terms of the framework 
of the four leadership styles, it was necessary 
to select methods of evaluation appropriate 
to the needs of the investigation. Although 
possible methods of appraisal included pencil 
and paper tests, observation techniques, inter- 
views, questionnaires, collection of actual (cre- 

* These four styles are identical to the four concepts 
described by Dr. Charles W. Nelson, Industrial Rela- 
tions Center, University of Chicago, in a paper given 
at the 1950 American Psychological Association meet- 
ings, Pennsylvania State College. The terminology 
used in Nelson’s paper is A. The Bureaucratic-Regula- 
tive Concept; B. The Autocratic-Competitive Concept; 
C. The Idiocratic-Manipulative Concept; D. The Dem- 
ocratic-Integrative Concept. 


Ea = ee 


DEST eh Niedes 
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ated) products of those being tested, and 
records (especially anecdotal and behavior 
records), it was decided that the needs of this 
investigation could best be served by the use 
of a self-administered inventory 
attitude questionnaire. 


a form of 


The following hypotheses were set up: 


1. The teacher's or administrator's attitudes 
as measured by scores based on their responses 
to the instrument indicate that the four leader- 
ship styles fall along a four point continuum. 

2. Items for the instrument can be so con- 
structed that independent qualified judges can 
recognize the inherent attitudes, and group the 
items according to the styles of behavior the 
scale was designed to reflect. 

3. The scale can be so constructed as to 
reveal individual differences in attitudes which 
can be measured and which overcome any 
stereotypes which may exist concerning the 
role of the teacher. 

4. Evaluations of the respondents by admin- 
istrators, supervisors, and fellow teachers will 
be positively related to their (respondents) 
scores on the scales. 

5. (Null hypothesis.) There will be no sig- 
nificant relationship between the respondent's 
scores and such personal and social factors as 
age, amount of professional education, years of 
experience, grade or level taught, subject 
taught, school system taught in, teaching load, 
average intellige: ce of students taught, size of 
class taught. 


In constructing items for the inventory 
items developed from sampling the activities 
through which teachers manifest their atti- 
tudes toward their social relationships—the 
following sources of information were used: 


1. Experiences in teaching. 
2. Textbooks on student teaching and gen- 
eral methods. 
. Textbooks on public, private, and educa- 
tional personnel management. 
Interviews with graduate students in 
education. 
. Questionnaires distributed to teachers, 
administrators, and graduate students. 


As a result of consultation of these sources, 
the following relationships and areas were 
selected as representative ones for studying the 
role of the teacher in this investigation. 


I. Dealing with Pupils. 


1. Handling pupil infractions of rules 
—discipline. 
. Methods of instruction—handling 
individual differences. 
. Planning the classroom work. 


Rating, testing, and recording be- 
havior of pupils. 
. Handling routine classroom matters. 
. Qualities of a good pupil—the 
teacher’s demands. 
. Handling minor complaints (griev- 
ances and complaints). 
Dealing with pupils’ suggestions. 
Dealing with pupil cliques—infor- 
mal classroom groups. 
Dealing with organized student 
groups—student organizations. 
. Motivating pupils—pupil incentives. 
. Determining pupils’ attitudes and 
stimulating morale. 


. Dealing with the Principal and 


Teachers. 
13. Qualities of a good teacher—teacher 
selection-teacher expectations. 
14. Orientation of new teachers—induc- 
tion. 
Rating of teachers. 
. Improvement — of 
service training. 
. Changing methods of instruction 
adjustment to change. 


Other 


instructors-in- 


. Dealing with Parents. 


18. Handling parents’ suggestions and 
complaints. 


Dealing with the Community. 
19. Relationships with 
groups. . 


community 


Dealing with the Superintendent. 
20. Rules, duties, and _ policies—the 
superintendent's demands. 
21. Getting action on school problems 
—teacher’s suggestions. 
22. Incentives to better teaching. 


Dealing with Other Employed Personnel. 
23. Colleague relationships—dealing 
with the custodian. 


Items were then constructed in the twenty- 
three areas which were indicative of the four 
styles of leadership. The items were incor- 
porated into the inventory (which was tenta- 
tively named ‘‘Opinion Inventory’’) through a 
modified use of Thurstone’s method of paired 
comparisons (3,8). It was hoped that ‘‘dis- 
guising”’ the scale with comparative judgments 
would avoid absolute judgments from respon- 
dents and minimize to a great extent some 
tendency toward ‘‘intellectualized’’ answers. 
Two methods of scoring (direct scores, indirect 
scores) were established. 

Two sets of judges (one set having previous 
knowledge of the framework of this scale, and 
the other set having no such knowledge) were 
asked to classify the items. In the first set, 
twenty selected judges were given written 
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explanations of the framework of the con- 
tinuum and of the four leadership styles. 
They were also each given ninety-two cards 
(twenty-three areas, four cards for each area) 
with the four alternatives in each area placed 
in random order. 

The twenty judges including five University 
of Chicago faculty members, five research 
assistants, five graduate students, and five 
educators employed in the field, were asked 
to read the definitions carefully, and then to 
classify each item as A, B, C, or D in terms of 
the framework. 

In the second test, four judges (four Univer- 
sity of Chicago Professors) were given a similar 
set of cards, each card containing one of the 
four alternatives. These judges were not given 
definitions of the four styles. In addition to 
being asked to describe the four styles they 
were also requested to arrange the four patterns 
on a four point continuum and to state briefly 
what they thought the framework of the con- 
tinuum was. The object of both tests was to 
correlate the testmaker’s. classifications of the 
items with the ratings of the two sets of judges, 
and also to scale the items somewhat in the 


’ manner of Thurstone’s method of equal appear- 


ing intervals. 
Results 


A preliminary form of the “Opinion Inven- 
tory” was administered to 73 subjects and a 
revised form to 515 teachers and administra- 
tors of 41 schools (14 school systems in Illinois, 
Indiana, Michigan, and New York). An 
attempt was made to select schools whose 
personnel were representative of those in 
typical city school situations. Table 1 in- 
dicates some of the characteristics of the sub- 
jects selected for this study. The two adminis- 
trations of the inventory revealed that: 


1. The instrument could, to a great extent, 
reveal individual differences in attitude scores. 
Fairly large ranges and standard deviations 
were obtained for the A, B, C, and D style 
distributions as is illustrated in Table 2 (hy- 


pothesis 3). 


Table 1 


Comparison of Some Personal and Social Characteristics of the 515 Respondents with Those of 


Educators in Typical City School Situations 





Characteristic 


Length of Experience 14 


Academic Training 
(a) Pre-Service 


(b) In-Service—No. of Graduate 
Courses in Last 3 Years 


Sex (% Male) 

Teaching Load 
(a) Classes per Day 
(b) Average Class Size 


Brightness of Pupils 


Grade Level 


Average 


Bachelor’s 
Degree (57%) 


Less than 1 


Age 33 
35% 


5 
31.4 


Average 


Nat’l. Average 
0-30+ — 


Range 


Bachelor’s 
(44.7%)* 


No Degree— 
Doctor’s 


0-16 + — 
21-65 - 
_ 19.1%** 


1-8 — 
Below 30.4** 

15-45-+ 
Below — 
above av. 


Average 


Elementary 
Secondary 
Position Held 

Superintendent 3% ‘Teacher- 1%** 
Principal 7.5 Supt. 4 
Teacher 89. 95 
Population of Locality (2,500- 1,200 (2,500- 
©,999)t 8,000,000 9,999)** 


53% 
47% 


K-6 
7-12 


56.7%** 
43.3% 


* 48 State School Systems, Chicaco: Council of State Governments, 1949. 
** Biennial Survey of Education. 19/5 +46 Statistics of City School Systems, 1945-46. 
t Represents most frequent cate,.\r, . 
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Table 2 


Means, Standard Deviations, and Ranges of Teacher and Administrator Leadership Attitude Scores 


Impersonal 
“A” Style 
24.5 

7.0 
8-49 


515 


2. The large standard deviations obtained 
and curve-fitting procedures showed that the 
distributions of the four leadership scores fit 
the normal curve rather than the J shaped or 
U shaped curves which are indicative of stereo- 
typed or categorical responses. Chi square 
tests showed no significant differences between 
observed and expected (normal curve) fre- 
quencies for the four leadership distributions. 
Table 3 and Figure 1 compare the two dis- 
tributions (hypothesis 3). 

3. Correlations among the total scores for 
each style indicated that the continum is 
followed as predicted. The intercorrelations 
are shown in Table 4. Further proof of the 


existence of the continuum was provided by 
judges’ ratings of the items with and without 


definitions of the four styles. Scale values 
determined by the psychophysical method of 
equal appearing intervals followed the pre- 
dicted continuum (hypothesis 1). 

4. The correlation of the test key with the 
judgments of the four professors without defi- 
nitions of the four styles obtained a product- 
moment coefficient of +.90. The descriptions 
by these four of the four styles were remark- 
ably close to the testmaker’s rationale. The 
coefficients for the relationship between twenty 
judges with the knowledge of the rationale, 


Personal 
“B” Style 
30.00 
6.5 
11-52 


515 








Counseling 
“C” Style 
38.00 ’ 
+5 
20-58 
515 


Integrating 
“D” Style 
45.5 
9.0 
20-63 
515 


and the testmaker’s items was +.86 (hy- 
pothesis 2). 

5. An item analysis of the preliminary form 
of the inventory showed that the majority of 
the item difficulties clustered around the 50 
per cent level although there was considerable 
range among them. A_ Kuder-Richardson 
formula’ for determining reliability was used 
in conjunction with the item analysis. An 
“abac” chart developed by Flanagan (2) was 
used to determine coefficients from the data 
of the upper and lower twenty seven per cents 
of the distributions. Coefficients of .81, .73, 
.57, and .88 were obtained for the direct A, B, 
C, and D scores, while coefficients of .91, .88, 
and .61 were obtained for the indirect A, B, 
and C scores. The indirect score method in 
this case is the term used for scoring method 
which counts both direct and indirect responses 
of subjects. Reliability coefficients (K-R) 
for direct A, B, C, and D scores on the revised 


§ The K-R formula used was: 
of — =p Lrifp of — > pq 
ae ee +e + ae 
The assumption made in using this formula is that a 
single factor is being measured. The use of the K-R 
formulas is explained in G. F. Kuder and M. W. Rich- 
ardson, The theory of the estimation of test reliability. 
Psychometrika, 1937, 2, 151-160. 


Table 3 


Chi Squares, Degrees of Freedom, and Probability Levels in Comparing Observed Frequencies and Expected 


Impersonal 
“A” Scores 


(Normal Curve) Frequencies for the Four Leadership Distributions 





Personal 
“B” Scores 


Counseling 
“C” Scores 


Integrating 
“D” Scores 





11.58 
10.00 
50 


11.62 
9.00 
30 


14.30 
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HEIGHT OF ORDINATE (FREQUENCY) 


he sult tae 





5 10 iS 
“4° 


20 
Fre. 1. 


form were respectively .79, .74, .65, and .87. 
Standard errors of measurement were com- 
puted for the four styles. For the first as- 
ministration they were +3.22, +3.33, 43.34, 
+3.21. For the revised test they were +3.21, 
+3.32, +3.25, +3.24. 

6. As a check on the validity of the in- 
ventory, evaluations of the respondents were 
made by their colleagues and supervisors. A 
sample of about one-fifth of the respondents 
(105 subjects) was selected for study. The 
cases in this sample were made up of those 
individuals whose leadership scores were one 
standard deviation above the mean for either 
the A, B, C, or D styles. In each case, either 
a superintendent, principal, or fellow teacher 
(sometimes both principal and superintendent) 
was given the definitions and descriptions of 
the four styles, and asked to observe and eval- 


Table 4 


Correlations between A, B, C, and D Scores of 
Teachers and Administrators 





25 


SCORES 


Histogram and normal curve comparing distribution of impersonal “A” style scores. 


uate the respondent in terms of which style of 
leadership he preferred. A product-moment 
correlation of +.59 was obtained between the 
subjects’ test scores and the ratings by col- 
leagues and supervisors. This would indicate 
a significant degree of relationship between 
the inventory scores and a criterion—the 
evaluation of leadership attitudes by colleagues 
and supervisors. However, it should be 
pointed out that this method of validation 
assumes that evaluations can be made reli- 
ably and in terms of a single index of classi- 
fication. Other methods of validation for 
further study are recommended below.® 

7. A search, through analysis of variance 
for certain personal and social correlates of 
leadership attitudes showed few significant 
relationships. Table 5 points out the F ratios 
obtained. Age and experience partially affect 
attitudes, the younger and less experienced 
the person the more integrative are his atti- 
tudes; the older and more experienced the more 
formal and impersonal they are. Amount of 
academic training as evidenced by the posses- 


®In a study of foremen in industry, Nelson found a 
product-moment coefficient of +.46 between the fore- 
men’s attitude scores and analyses of their Rorschach 
and T A T’s by clinical psychologists. In recent months 
Nelson has been finding significant relationships be- 
tween leadership attitudes and Szondi test responses 
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Table 5 


F Ratios among A, B, C, and D Scores for Certain Personal and Social Characteristics of 
Teachers and Administrators 








Total 
d.f. 


Characteristic 


Age 
Sex 1.01 
4.01 
1.83 
3.53** 
2.46* 
2.01** 
1.72 
2.10* 
1.11 
1.21 
1.22 


Position 

Experience 
Pre-Service Training 
In-Service Training 
Subject Taught 

No. of Classes 

Class Size 
Brightness of Pupils 
Grade Level 

School System 


* Significant at the five per cent level. 
** Significant at the one per cent level. 
*** Significantly homogeneous. 


sion of advanced degrees, or by the recent 
completion of graduate courses appears to 
have some relationship to integrating ‘“D” 
style scores. However, the most important 
determinant seems to be the individual school 
situation. It appears that the human rela- 
tionships within each school are more effec- 
tive in shaping the attitudes of the staff mem- 
bers than the attitudes of other schools in the 
system (hypothesis 5). 


Implications and Recommendations 


This study has been of value since the in- 
strument developed can help: (a) provide a 
morale picture of schools; (b) show the nature 
of the process of interaction on the attitudinal 
level; (c) provide a framework for viewing 
informal communication within a system; 
(d) place leadership on the basis of communi- 
cation; and (e) measure leadership on a con- 
tinuum. Generally the instrument developed 
provides a technique for objectifying and 
measuring social interaction itself rather than 
interaction inferred from the characteristics 
of individuals involved, the structure of the 
organization, or the technique used. To make 
the “Opinion Inventory” of further value to 
the field of education certain recommendations 
are made: 


Impersonal 
“A” Scores 


124 


“Personal 
“B” Scores 


Counseling Integrating 
“C” Scores “D” Scores 

1.04 

1.72 

1.81 

1.35 

1.94 

1.43 

1.49 

1.13 

1.14 

1.47 

1.02 

1.12 


1.70 
1.13 
6.41*** 
2.70** 
1.64 
1.06 
1.54 


1. A study should be made of the psycho- 
logical factors in the four leadership styles. 
Projective tests administered to the teachers 
studied and interviews by clinical psychol- 
ogists can make further revelations of be- 
havior expected from the four types. The 
study of personality would certuinly be of 
assistance in bringing about adjustments in 
social relationships. 

2. A more comprehensive study of cor- 
relates in leadership attitudes should be made. 
The investigation of social factors in this 
report has been little more than superficial. 
More subtle factors should be looked into as: 
status, parental upbringing and childhood 
experiences, interests, motivations. It should 
be evident that more informal methods of 
investigation are needed to obtain this in- 
formation. 

3. Further checks for validity—on the 
reality of the four styles can be made by long- 
range, intensive, non-directive interviewing 
and by continuous observations of behavior 
by participant observers. 

4. The instrument can be improved by: 


(a) A better method of scoring. The use 
that has been made of a single test score for 
each leadership style can be easily challenged 
on statistical grounds. To justify the use of 
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a single score, positive intercorrelations among 
all the items are needed. This would involve 
the computation of some 10,000 correlations. 
Since this is a rather tedious task, the use of 
short cuts such as Thurstone’s Edgemarking 
method should be sought. 

(b) If the above intercorrelations are cal- 
culated, they can be of use to further research 
in the form of a factor analysis. 

(c) Scaling of the items is suggested. Some 
device to determine psychological distance 
between choices may be used such as Thur- 
stone’s law of comparative judgment (case V) 
or Guttman’s scaling method. 

(d) A reduction in the length of the inven- 
tory is in order with the suggestions of many 
of the respondents. This may be achieved 
by making alternative forms of the 138 items 
through two scales of 69 items each, and by 
the use of some other psychophysical method. 
Although paired comparison methods are very 
subtle and make for substantial internal con- 
sistency, they are reported to be very boring 
for the subjects responding. Another sugges- 
tion for reducing the length of the inventory 
is to eliminate the two least discriminating 
pairs of responses from each of the 23 areas. 
This would result in a scale of 92 items. 


5. Research is needed to determine which 
styles are needed for each leadership situation. 
The casual reader might imply from the ra- 
tionale that the integrating ‘““D” style in- 
dividual is the preferable one. This is not 
the intention. If the leadership needs stem 
from the demands of the situation, it would 
probably be very difficult for a group long 
accustomed to formal and structured re- 
lationships to adjust itself to an integrating 
style leader. We must determine how effec- 
tive the four leadership styles are in various 
types of situations. 

6. Once the style of leadership needed has 
been determined, there remains the matter of 
changing and modifying individual and group 
attitudes. Research in modifying leadership 
attitudes, in helping people to redefine their 
roles, is necessary. 


7. A form of the “Opinion Inventory” 
should be administered to the geveral levels 
of the hierarchy in the educational organiza- 
tion. By observing the responses of teachers, 
pupils, parents, the principal, the superintend- 
ent, community groups, and other employed 
personnel in a local situation, we may better 
understand the nature of the network of in- 
formal organization. 

In the further administration of the ‘“Opin- 
ion Inventory,” its use as an index for pro- 
motion, hiring, firing, rating, and salary 
scaling is definitely not recommended. Such 
actions would probably destroy the good 
rapport needed with subjects and render the 
instrument useless. 

8. The framework used in this study may 
be used in other occupations. It has already 
been used for foremen in industry, and for the 
dealer-customer relationship. It can prob- 
ably be used in studying the communication 
process at the hard core of any social insti- 
tution. It is believed that the eight recom- 
mendations above can help the instrument 
make a greater contribution to educational 
research. 
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Verbalization and Learning a Manipulative Task 


C. H. Lawshe and William Cary 
Occupational Research Center, Purdue University 


The growing recognition of the importance 
of industrial employee training has been ac- 
companied by a search for those principles 
of training which will maximize the effective- 
ness of job instruction. One such principle 
which has received considerable attention in 
recent years is that of encouraging the learner 
to repeat to the instructor those things which 
he has learned. The implication of this 
process is that it will show those gaps in the 
process where the learner did not absorb the 
presented ideas, and that it will also fix the 
process in his mind. 

This study was designed to determine the 
effect of verbalization on the number of trials 
required to learn a manipulative task. In 
addition, the effect of verbalization on a 
learner’s ability to conform to a prescribed 
procedure was investigated. 


Procedure 


Method. ‘The method utilized tasks consist- 
ing of sub-tests A-2 and A-4 from the Purdue 
Mechanical Assembly Tests (1). These tests! 
are boxes in which various types of levers, 
gears, pinions, and shafts are assembled so that 
a given type of mechanical action will occur. 
The parts are presented on a mounting board 
so that the unassembled test may be intro- 
duced to the subjects in the same fashion each 
time. 

Matching Phase. ¥ifty-two male students 
enrolled in an elementary psychology class at 
Purdue University served as subjects. Using 
a carefully standardized training procedure, 
they were individually taught how to assemble 
sub-test A-2. After two practice trials, three 
timed trials were given. All deviations from 
the standardized assembly method were re- 
corded as errors. On the basis of the timings 


‘ For purposes of this experiment, the knurled screws 
in each sub-test were disregarded. Preliminary analysis 
of the task revealed that such a modification would 
minimize the actual time required for assembly without 
altering the nature of the task. 


only, individuals were matched and put into 
either a control group or an experimental 
group. Next, members of each group were 
taught to assemble sub-test A-4. The pro- 
cedure followed with the control group was 
identical to that just described. The ex- 
perimental group procedure differed only in 
that members were required to ‘“‘talk-back”’ 
their instructions to the trainer, i.e., to verba- 
lize as they assembled. Each subject had as 
many trials as necessary to attain an a priori 
time standard; all trials were timed and, since 
certain errors were not reflected in time scores, 
they were also recorded. 


Results 


Following the collection of these data, a 
statistical analysis was made of the differences 
of performance between the experimental and 
control groups. A comparison of the differ- 
ences between means and standard deviations 
of the control and experimental groups on the 
number of trials to reach the criterion was 
investigated. Such a comparison seemed 
likely to reveal any differences between the 
groups which would tend to be obscured on 
the trial criterion. The procedure followed 
was the orthodox method used in testing differ- 
ences between two small matched groups. 

In summarizing these statistical data, it 
was found that the “t” ratios of the means 
on the number of trials required to meet the 
criterion and the average time on the first 
three trials were .34 and .61, respectively. 
These small values of ‘‘t” indicate that the 
obtained differences of performance between 
the two groups could readily have occurred 
by chance alone. Similarly, critical ratios of 
the standard deviations are not large enough 
to permit refutation of the hypothesis that 
there is no true difference between the groups 
in variability of performance. It was also 
observed that the critical ratio for the number 
of trials to meet the criterion is .81 and the 
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critical ratio for the average time on the first 
three trials is 1.38. 

Although the latter critical ratio is non- 
significant according to accepted standards, 
it does suggest that verbalization may exert 
a differential effect on the “fast” and “slow” 
performers. In order to investigate this 
possibility further, each group of subjects was 
divided into two sub-groups: (1) those whose 
time scores on the Matching Phase sub-test 
(sub-test A-2) fell above the mean; and (2) 
those whose time scores fell below the mean 
on this sub-test. Those groups whose time 
scores fell above the mean, i.e., required more 
time, are hereafter referred to as the ‘‘high” 
groups; those whose time scores fell below the 
mean are referred to as the “low” groups. 

After this sub-division, the differences of 
means and standard deviations between the 
“high” groups and the “low” groups for both 
measures of performance on sub-test A+ were 
investigated. All differences were too small 
to warrant rejection of the hypothesis that 
there was no true difference of performance 
between the groups. If the 15 per cent level 
of confidence is established as indicating sig- 
nificance, all differences of means and standard 
deviations are readily attributable to chance 
factors. 

In view of these low probabilities of a true 
difference beyond zero existing between the 
groups it can be reasonably concluded that 
the experimental group required as many 
trials to meet a performance criterion and 
spent as much time on their first three trials 
as the control group. 

It will be recalled that certain parts in sub- 
test A-4 could be positioned in a different 
sequence from that prescribed by the instructor 
without being reflected in the time score. The 
differences of means and standard deviations 
between the experimental and control groups 
on the number of such errors committed were 
examined for statistical significance. Only 
those errors committed on the first three trials 
of sub-test A—4 were included in the analysis 
since all subjects performed the task a mini- 
mum of three times, but a varying number of 
trials were required to reach the criterion. As 
the number of errors was a function of the 
number of trials on the task, such a comparison 


( 


gave each subject an equal opportunity to 
commit an error. 

It was observed that the “‘t”’ ratio of the 
mean differences between the experimental 
and the control groups is 1.20 and the “t”’ ratios 
of the “high” groups and the “‘low’’ groups 
are 1.20 and .56, respectively. These differ- 
ences are not significant at the 20 per cent 
level of confidence. Hence, it is concluded 
that the experimental group did not make 
significantly fewer errors than the control 
group. 


Summary and Conclusions 


The purpose of this experiment was to deter- 
mine the effect of verbalization on the number 
of trials required to learn a manipulative task. 
Fifty-two students at Purdue University were 
divided into two matched groups and individ- 
ually taught to assemble a manipulative task. 
After this instruction, the experimental group 
verbally described each operation of the task 
back to the instructor as they performed it 
and the control group performed the task 
without verbalizing. The groups then re- 
petitively performed the task .until they 
reached a pre-established criterion of learning. 
Differences of means and standard deviations 
between the control and experimental groups 
on the number of trials to reach the criterion 
and the number of errors and average time on 
the first three trials were analyzed for their 
significance. None of these differences were 
found to be significant. Each group was then 
divided into two sub-groups and the differ- 
ences between the means and standard de- 
viations of performance were analyzed. Again 
it was found that all differences were so small 
that they could be readily attributed to chance 
variations. 

On the basis of these results, the following 
conclusions were drawn: 


1. The experimental group which verbalized 
or “talked back” their instructions to the 
instructor required as many trials to meet 
a performance criterion and spent as much 
time on their first three trials as did the control 
group. 

2. Similarly, the experimental group did 
not make significantly fewer errors on the 
first three trials than the control group. 
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3. The high experimental group whose time 
scores fell above the mean on the initial task 
required as many trials to meet a performance 
criterion and spent as much time on their first 
three trials as did the high control group. 


It must be borne in mind that the subjects 
in this experiment were college students and 
that it is possible that quite different results 
would be obtained on a different population. 
It should also be pointed out that subjects 
were required to verbalize while they per- 
formed the task and that they verbalized only 
on one trial. Further studies in this area 


might reveal that verbal description without 
performance and/or verbalization over a 
greater number of trials exerts a positive 
effect on the ability of a learner to perform a 
task. 
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A Punched Card Procedure for Use with the Method of 
Paired Comparisons 


N. C. Kephart and James E. Oliver 
Division of Education and Applied Psychology, Purdue University 


Problems involving the method of paired 
comparisons require the preparation of in- 
dividual slips with one pair of items on each 
slip. It is desirable that the names be paired 
on separate slips, as opposed to mere presenta- 
tion of two lists of the names so that time and 
space errors may be controlled. (Time and 
space errors relate to the order of presentation 
of pairs and to the relative position of members 
of the pairs, respectively.) Such a slip is 
required for each of the possible pairs of items 
in the study, the number of such pairs being 
equal to N(N-1)/2. The labor involved in 
the preparation and scoring of these pairs has 
been an often repeated adverse criticism of 
the method. 

The materials necessary for such problems 
can be prepared and scored on punched card 
equipment. Control of the relative position 
of members of pairs and the order of presenta- 
tion of pairs is provided. In addition, con- 
siderable saving is made with respect to the 
time required to make and score the pairs. 
For example, an experienced operator can 
produce a deck of 300 pairs (25 names in 
variable list) by punched card methods in 
approximately 30 minutes. Subsequent scor- 
ing would require approximately 10 minutes. 
If a typewriter were used to prepare the pairs, 
12 to 16 hours of clerical labor would be re- 
quired in preparation and scoring. 

An example of the procedure for setting up 
paired comparison materials applied to ratings 
of workers on job performance is given below. 
This procedure is systematic and can be di- 
rectly applied to any other problem involving 
the paired comparison technique. 


Preparation of the Pairs 
1. Assign each worker a serial number from 
1 to N. The method of assigning numbers 
makes no difference to the results. Any order 
of the original names can be used but no serial 
number can be used more than once. Punch 


47 


the serial number in columns 1 and 2 and the 
names in columns 5 to 25, punching one card 
for each name. Call this master deck 1. 

2. Reproduce the cards from step 1, chang- 
ing the serial number to columns 3 and 4 and 
the name to columns 40 to 60. Write the 
serial number (punched in each card) on the 
back of each card for the first one-half of the 
cards in the deck. Call this master deck 2. 

3. Reproduce the cards from step 1. (Be 
sure that the serial numbers of these cards are 
in consecutive order.) Call this set / and write 
a ‘‘1”’ on the back of the last card reproduced, 
i.e., label the set. 

4. Reproduce the cards from step 1 again. 
Call this set 2 and write a ‘‘2” on the back. of 
the last card in this deck. Lay set 2 adjacent 
(not on top of) set 1. 

5. Continue reproducing and numbering 
each set of cards from step 1 (set 3, set 4, etc.) 
each time writing the consecutive number of 
each set until the appropriate number of sets 
has been reproduced. If the number of 
names being paired is odd, (N-1)/2 sets are 
needed. If the number of names being paired 
is even, N/2 sets are needed. 

6. The serial numbers on the cards in each 
set are in consecutive order. Be sure the 
cards from step 2 (master deck 2) are in con- 
secutive order. 

7. Reproduce deck 2 into each set that has 
been prepared. 


a. Before deck 2 is reproduced into set 1, 
remove the number 1 card from the bottom of 
deck 2 and place it on top. 

b. Before deck 2 is reproduced into set 2, 
remove the number 2 card from the bottom of 
deck 2 and place it on top. 

c. Before deck 2 is reproduced into set 3, 
remove the number 3 card from the bottom of 
deck 2 and place it on top. 

d. Continue this procedure until all sets 
have been reproduced. 
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Fic. 1. Card on which judge’s choice is recorded. 

8. If an odd number of names is being 
paired, the appropriate number of pairs will 
be made after the last set has been reproduced 
from step 7. 

9. After pairs are all prepared, reproduce 
random numbers in columns 76-78 from a 
previously prepared deck of random numbers. 

10. Sort cards progressively from 76-78. 

11. Interpret names on cards; do not in- 
terpret serial numbers. Leave considerable 
spacing between the two names. See Figure 1. 

12. If a judge rates on more than one factor, 
each independently, or if several judges are 
rating, more than one deck of paired names 
is needed. After step 10 above, reproduce 
and interpret as many additional decks as may 
be required. As new decks are reproduced, 
pairs will be in correct order for presentation. 


Example: 7 names (cards go in reproducer in this order 
when deck 2 is reproduced into each set). 
Set 1: : 3 2 


Deck 2: : 4 é 


Set 2: 3 
Deck 2: 5 


Set 3: 5 3 
Deck 2: 


— = 21 pairs 


13. When the number of variables is even, 
one-half of the last set in step 7 must be de- 
stroyed as the pairs on the first and last halves 
are the same except that the order within 
pairs is reversed. Destroy one-half of the 
last set reproduced from step 7 and proceed 
with steps 9 to 12. 


Example: 6 names 
Set 1: 6 + K 
Deck 2: 1 5 3: 


Set 2: 6 
2 


Deck 2: 


2 
4 
Set 3: e-% AZ 
Deck 2: 3 rr 
Destroy 
6) 


( © . 
ales 15 pairs 


Scoring of the Pairs 


The judge then makes his choice between 
the two members of each pair and indicates 
this choice by a pencil mark. 

14. Sort the cards by hand into two groups: 
Those in which the left-hand member of the 
pair is checked and those in which the right- 
hand member is checked. 

15. Sort progressively on columns 1 and 2 
of the left-hand preferred group and with the 
tabulator (controlling on serial number) count 
the number of choices for each serial number. 

16. Repeat step 15 for the right-hand pre- 
ferred group, sorting and controlling on col- 
umns 3 and 4. 

17. By manually adding the card counts 
obtained for each worker (each serial number) 
on the right and left (steps 15 and 16) the 
total number of choices is obtained. 


Received A pril 4, 1951. 








Errors of Interpolation in Instrument Reading and Setting 


Charles Martin Levett, Jr. 
Lehigh University 


The importance of linear interpolation is 
apparent in industrial situations which require 
accurate scale readings on instruments or dials. 
If general principles can be established which 
are applicable to all human beings, it is quite 
possible that further research will bring forth 
methods of overcoming errors in dial or in- 
strument reading to a greater degree than is 
possible at present. 

Previous work connected with the study of 
linear interpolation has been described by 
Miller' and is therefore omitted here. 

Miller’s study of linear interpolation involved 
interpolation in tenths, of five interval sizes 
(1 mm., 2 mm., 3 mm., 5 mm., and 10 mm.). 
Results showed large individual differences to 
be of more importance than interval sizes or 
biases. 

Purpose 

The purpose of this experiment was to study 
the accuracy of interpolation in three differ- 
ent situations: (1) reading from a slide rule 
set by the experimenter, (2) setting of the slide 
rule by the subject, and (3) the 10 mm. Miller 
cards. It is also the purpose of this investi- 
gation to find out if the Miller cards can be 
used for testing linear interpolation in lieu of 
a slide rule arrangement. 


Procedure 


Subjects. Thirty subjects were used ranging 
in age from 18 to 68. Most of the subjects 
were students at Lehigh University; however, 
a number of subjects from various occupations 
were used as well. The students were repre- 
sentative of all three curricula: liberal arts, 
engineering, and business administration. A 
predominant number were psychology under- 
graduate and graduate students. Out of the 
total of thirty subjects eight were women. 

Apparatus. Six Miller cards (10 mm.) were 
used. Each digit from one up to nine was 
represented on these cards six times, making a 
total of 54 readings on each card. The various 
judgments are arranged in random order. 


1 Miller, H. K., Jr., An exploratory study of linear 
interpolation. J. appl. Psychol., 1950, 34, 367-370. 


A slide rule without scale divisions or num- 
bers was set up in the following manner. Two 
perpendicular lines 10 mm. apart were drawn 
on the stationary part of the slide rule while a 
third perpendicular line was drawn on the 
movable section. A mirror was attached to 
the movable section. An Army Air Force sight 
gun was used as a light source. The light 
from this source was reflected by the mirror 
on to a scale 1825 mm. away. The scale used 
was 690 mm. long, and each interval on the 
scale was between 67 and 73 mm. long. Con- 
sequently a movement of one mm. on the slide 
rule resulted in an approximate movement of 
70 mm. on the scale. 

The scale was so calibrated and adjusted 
that zero on the stationary slide rule corre- 
sponded to zero on the magnified scale, and ten 
on the stationary slide rule corresponded to 
ten on the magnified scale. 

The movable part of the slide rule was 
manipulated by means of a wheel attached to 
a long threaded bolt which fitted into a nut 
attached to the slide rule. 

A partition was placed between the subject 
and the scale to prevent the subject from 
observing his results as the experiment pro- 
gressed. However it did not block the experi- 
menter’s view. 

An overhead fluorescent fixture and a fluo- 
rescent desk lamp were used for illumination. 
Care was taken to make sure that no shadow 
was cast on the slide rule. The slide rule was 
wiped at frequent intervals to prévent subjects 
from using dust specks as reference points. 

The Three Parts of the Experiment. The 
experiment consisted of three separate activi- 
ties. The subject was first required to inter- 
polate the 54 settings on one 10 mm. Miller 
card. Each subject was allowed to interpolate 
without the use of any device which would 
prevent looking back at interpolations already 
made. Subjects could go back and change 
judgments if they desired. The subject re- 
corded his judgments on a mimeographed 
answer sheet with the blanks numbered to 
correspond with the 54 settings on the card. 
No time limit was placed on the subject. 

In the second part of the experiment, the 
experimenter set the movable part of the slide 
rule at various points between the two perpen- 
dicular lines on the stationary part of the rule, 
using extreme care in placing the reference line 
at even tenths. After each judgment the ex- 
perimenter moved the rule back and forth 
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several times before putting it in position for 
the next judgment. Each tenth was presented 
six times in random order according to a pre- 
arranged key, making a total of 54 judgments. 

The third part of the experiment required 
the subject to set the rule at specified tenths 
between the two lines. After each judgment 
the experimenter moved the rule back and 
forth several times before the next number was 
presented. 

A trial consisted of one Miller card (54 read- 
ings), 54 readings from the slide rule, and 54 
settings by the subject. After an hour of 
preliminary practice each subject completed 
six trials divided into two sessions. This re- 
sulted in a total of 324 readings on each of the 
three parts. ; 

In none of these procedures was the subject 
informed of his errors. 


Results 


In Table 1 are shown errors subject-by- 
subject. The column headed “Slide Rule 
Errors” indicates the number of errors in 
reading the slide rule set by the experimenter. 
It can be seen that there are large individual 
differences, since errors range from 0 to 87 
out of 324 judgments. Occupation, age, and 
sex apparently are not determining factors. 

The column headed “Mean Discrepancy in 
Setting” in Table 1 indicates the average 
amount by which subjects missed the exact 
positions when setting the slide to assigned 
numbers. The discrepancies ranged from a 
low of .085 to a high of .421. 


Table 1 


Errors of Subjects 


Subject Occupation 


Psych. Grad. 

Ind. Eng. Undergrad. 
Met. Eng. Undergrad. 
Psych. Undergrad. 
Mech. Eng. Undergrad. 
Secretary 


~ 
8 


G.D. 
E.D 
W. J. 
R.S. 
EH. 
J.D 
B.G 
H.L 
M. F. 
P. F. 
ALS. 


ane 


N 


NReNw NM Nw Ne 
Nm 


_ 
oS 


Psych. Undergrad. 
Chem. Eng. Undergrad. 
Psych. Undergrad. 
Elect. Eng. Undergrad. 
Pre-Dental 

School Teacher 

School Teacher 

Elect. Eng. Undergrad. 
Psych. Undergrad. 
Psych. Undergrad. 
History Undergrad 
Bus. Ad. Undergrad 
Psych. Undergrad 
Psych. Grad. 

Salesman 

Psych. Undergrad. 
History Undergrad 
Research Assistant 
Secretary 

Store Clerk 

Psych. Undergrad. 
Psych. Undergrad. 
Psych. Undergrad 
Stock Broker 


— = tO 
conMmw 


tr 
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Miller 
Card 


Errors 


Errors 


Errors Over .5 


Setting 
085 
141 
.168 
131 
.228 
-162 
.206 


.254 
.241 
.246 
.238 
.263 
335 
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Table 2 


Number of Errors at Each Position 





2 3 


Slide Rule Readings : 95 37 
Slide Rule Setting 

Errors Over .5 114 139 
Miller Cards 32 117 


The column headed ‘Errors over .5”’ refers 
to the number of errors of a magnitude greater 
than .5 when subjects set the slide rule. This 
column is comparable to “Slide Rule Errors” 
in reading. However the numbers are larger 
in most cases, ranging from 0 to 104. Ap- 
parently, most subjects make more gross 
errors in setting the slide than they do in 
reading the experimenter’s settings. 

The last column in Table 1 marked “Miller 
Card Errors” shows the number of errors made 
by each subject on the 10 mm. Miller cards. 
Errors range from 0 to 77, again indicating 
large individual differences. 

In comparing all four of the above mentioned 
columns it can be seen that the subjects who 
had the lowest number of slide rule errors also 
made very few errors in other aspects of the 
experiment. The rest of the subjects do not 
show such consistency. Thus subjects who 
made a large number of errors in slide rule 
readings did not necessarily show large errors 
for ‘““Mean Discrepancy in Setting,” ‘Errors 
over .5,” or the “Miller Cards.” 

“Slide Rule Errors” correlate .62 with 
“Mean Discrepancy in Setting,” .61 with 
“Errors over .5” and .37 with “Miller Card 
Errors.” This last correlation indicates that 
the two seemingly similar situations are not 
so similar as might appear on superficial ex- 
amination. Miller cards cannot be adequately 
used as a substitute for slide rule readings. 


“64 
151 
138 


Table 2 is concerned with errors in relation 
to position on the scale and includes all sub- 
jects concerned. Errors in reading the slide 
rule, in setting the slide rule, and in reading 
Miller cards all show a sharp dip at position 5. 
Errors in setting the slide rule and in reading 
Miller cards are low at positions 1 and 9, but 
this does not hold true for the same positions 
in reading the slide rule. This re-emphasizes 
the fact that reading a slide rule and reading 
Miller cards are not equivalent processes. 

Table 3 is concerned with the general nature 
of biases. Although biases vary somewhat 
from subject to subject some general trends 
can be noted. 

The biases shown for “Slide Rule Readings”’ 
indicate that subjects tend to use the end 
lines and an imaginary line at position 5 as 
points of reference. For example, at position 
6 there is a strong preponderance of plus errors. 
That is, subjects tend to read an actual 6 asa 7, 
as if they thought of 6 as lying closer to the 
imaginary center line than it does. In making 
settings at position 6 the subjects tend to set 
closer to the center, resulting in a minus mean 
discrepancy. The bias in reading and the 
bias in setting are therefore entirely consistent. 

There seems to be a definite tendency on 
the part of the subjects to think of 1, 2, 8, and 
9 as closer to their respective end lines and 
to think of 3, 4, 6, and 7 as closer to the imagi- 
nary center line than is actually the case. 


Table 3 


Preponderance of Error at Each Position 





Totals 1 3 


Slide Rule Readings +39 +93 +1 
Mean Settings —.14 +.02 
Miller Cards +4 + —37 


4 5 6 
—54 —11 +73 
+.23 +.08 —.16 

— 126 —10 +148 


eas 


ey ree 
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Miller cards show biases which are similar 
to those found in “Slide Rule Errors,” although 
the degree of bias is greater at some positions 
and less at others. 


Summary 

Thirty subjects of both sexes and of various 
ages and occupations were required to make 
interpolations between marks 10 mm. apart 
in three different situations: (1) reading from 
a slide rule set by experimenter to exact tenths, 
(2) setting the rule to tenths, and (3) reading 
tenths from Miller cards. Each subject made 


324 interpolations in each of the three situa- 
tions. 

Results showed large individual differences 
among subjects. Slide rule reading errors 


range from 0 to 87. Slide rule setting errors 
in excess of half a unit ranged from 0 to 104. 
Errors in reading Miller cards ranged from 0 
to 77. Correlations between slide rule read- 
ings and the mean discrepancy in setting was 
.62, between slide rule readings and errors over 
.5 was .61, and between slide rule readings and 
Miller card readings was .37. 

In all three methods errors were made less 
frequently at position 5 than at any other. 

Readings at positions 1, 2, 8, and 9 showed 
an inward bias, possibly due to the use of the 
end lines as reference points. Readings at 
positions 3, 4, 6, and 7 showed an outward 
bias, possibly due to the use of an imaginary 
line at the center as a reference point. 
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A Note on “Simplification of Flesch Reading Ease Formula”’ 


George R. Klare 
The Psychological Corporation, New York City 


In a recent article (3), Farr, Jenkins and 
Paterson proposed a simplification of the 
Flesch “Reading Ease’’ formula (4). In the 
simplification, Flesch’s average sentence length 
factor remains unchanged, but number of 
one-syllable words is substituted for Flesch’s 
factor of number of syllables per 100 words. 
The rationale for this change is that the authors 
feel that analysts often have difficulty in 
counting the number of syllables in poly- 
syllabic words; that the “simpler method 
would obviously be much faster and would 
require no knowledge of syllabification on the 
part of the analyst. It would merely require 
the analyst to recognize and count the number 
of one syllable words.” 

There seems little doubt that many analysts, 
particularly inexperienced ones, have difficulty 
in recognizing syllables. Even language ex- 
perts find syllables hard to define (5). This 
writer would like to raise several questions, 
however, about the two advantages claimed 
for counting one-syllable words over counting 
all the syllables in words. 

It is stated that the simpler method would 
require no knowledge of syllabification on the 
part of the analyst. How can an analyst with 
no knowledge of syllabification recognize and 
select one-syllable words from the mixture of 
one and one-plus syllable words found in 
writing? By letter length? Compare “piano” 
and “piece.” By number of vowels? Com- 
pare “badge” and “banjo.” By singular 
versus plural forms? Compare “govern” and 
“grounds.” By pronunciation? Compare 
“flower” and “flour.”” All of the above words 
are familiar to most fourth grade pupils (1). 
The problem could be magnified by presenting 
less “familiar” words as examples. It may 
well be that very nearly as much knowledge 
of syllabi%cation is required, or at least as 
much difficulty met, in selecting one-syllable 
words as in counting all the syllables in words 


—unless, of course, the analyst uses a list of 
one-syllable words. Since such a list would 
have to include several thousand words (2,6), 
speed of application would be sacrificed. 

The second question, then, is whether or 
not the simpler method would, as stated, 
‘obviously be much faster.”’ Since Table 2 of 
the authors’ article indicates that most writing 
contains a majority of one-syllable words, it 
seems probable that about the same majority 
of the remaining polysyllabic words should 
contain but two syllables. An analyst with 
a knowledge of syllabification would find 
relatively few many-syllabled words to count. 
Would a significant amount of time be saved 
by the proposed method over the method of 
“reading silently aloud” and counting all 
syllables? 

A third and related question also arises. 
Would not each counting error be magnified, 
and reliability decreased, by the new method? 
Since writing contains more syllables than 
one-syllable words, this would seem to be the 
case unless one could be assured that the 
analyst would make several syllable errors 
to each word error. 
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Reply to “Simplification of Flesch Reading Ease Formula’’ 


Rudolf Flesch 
Dobbs Ferry, New York 


In their paper “Simplification of Flesch 
Reading Ease Formula” (1), Farr, Jenkins, 
and Paterson propose to replace the count of 
syllables per 100 words in my Reading Ease 
Formula (3) by a count of the number of one- 
syllable words per 100 words. Their proposal 
is open to criticism on three grounds: 

1. The raison d’étre for the new formula is 
that it is a “simplification.” Farr, Jenkins, 
and Paterson say that “this simpler method 
would obviously be much faster and would 
require no knowledge of syllabication on the 
part of the analyst.” However, this is not 
obvious at all. The average number of. one- 
syllable words per 100 “standard” words is 
about 70 (see Table 1 below), whereas the 
average number of syllables per 100 “standard” 
words is about 150. Since syllables are counted 
practically by counting only those beyond the 
first, this means that the new formula, on the 
average, counts 70 items where the old formula 
counted only 50; in other words, the “faster” 
new formula typically means about 40 per cent 
more work in measuring the word factor. As 
to knowledge of syllabication, the analyst 


needs just as much for the one count as for 
the other; in fact, words like “stirred,’’ “‘fire,”’ 
or “doesn’t” are more liable to offer problems 
than words like “simplification” or “service- 
ability.” So the “simplified” formula may 
well be more cumbersome than the old one. 

2. Farr, Jenkins, and Paterson based their 
new formula on 360 100-word samples from 
22 General Motors employee handbooks. The 
handbooks ranged from 36 (“Difficult”) to 
57 (“Fairly Difficult”). According to the 
authors, “It is safe to predict” that the re- 
ported correlation of .95 between the old and 
the new formulas would reach .99 if material 
on every level from “Very Easy” to ‘Very 
Difficult” had been sampled. That predic- 
tion is not safe. The correlation that holds 
for 22 homogeneous GM _ handbooks within 
the narrow range of 36 to 57 is apt to drop 
rather than rise when heterogeneous materials 
over a wide range of Reading Ease Scores are 
sampled. To illustrate, I offer Table 1, which 
shows the comparative counts and scores for 
the 11 examples given in my booklet How to 
Test Readability (2). The examples range 


Table 1 


Comparison of 11 Examples from How to Test Readability as to Two Syllable Counts 
and Two Reading Ease Scores 


No. of 
One-Syllable 
Words per 
100 Words 
80 
82 


Example 


No. of 
Syllables 
per 100 
Words 
122 
124 


Reading 
case 
Score 
(New Formula) 
&4 
86 


Reading 
Ease 
Score 

(Old Formula) 


91 
89 


4a 


4a 


73 


131 
127 
141 
144 
145 
152 
164 
143 


175 


81 
80 
69 
68 
66 


48 
47 
29 
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from the Bible (Example 1, “Very Easy”) 
to John Dewey (Example 11, “Very Difficult”). 
As Table 1 shows, the two formulas give 
roughly equivalent results around the “‘stand- 
ard”’ score of 65, but tend to diverge more and 
more toward either end of the scale. The 
“ceiling” for one-syllable words per 100 words 
seems to be about 80, the “floor’’ about 60. 
So the new formula apparently tends to under- 
rate both ease and difficulty. Over the full 
range, it appears to be less sensitive than the 
old formula. 

3. The proposed new formula seems a step 
in an undesirable direction, resulting in a 
cruder rather than a more precise measure. 
In effect, Farr, Jenkins, and Paterson reduce 
readability to the use of short sentences and 
one-syllable simplification 
with a vengeance, giving fresh ammunition 


words. This is 


to those critics of readability measurement 
who dismiss it as a movement toward “baby 
talk” or “‘primer style.”” After all, the count 
of one-syllable words was abandoned by stu- 
dents of readability some fifteen years ago. 
To return to it now would coarsen the tech- 
nique of readability measurement and impair 
its value as a diagnostic tool in the improve- 
ment of communication. 
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Published out of turn by the editor. 


References 

1. Farr, J. N., Jenkins, J. !., and Paterson, D. G. 
Simplification of Flesch reading ease formula. 
J. appl. Psychol., 1951, 35, 333-337. 

2. Flesch, R. How to test readability. 
Harper & Brothers, 1951, pp. 56. 

3. Flesch, R. A new readability yardstick. 
Psychol., 1948, 32, 221-233. 


New York 


J. appl 


Reply to Klare and Flesch re ‘Simplification of Flesch 
Reading Ease Formula” 


James N. Farr, James J. Jenkins, Donald G. Paterson, and George W. England 


Department of Psychology, University of Minnescta 


Both Klare (5) and Flesch (3) attack the 
statements that this simpler method “would 
require no knowledge of syllabification on the 
part of the analyst” (2, p. 333) and it “would 
not require knowledge of syllabification on 
the part of the analyst” (2, p. 337). The 
writers plead guilty to careless overstatement. 
What should be substituted is a statement to 
the effect that this simpler method requires 
less precise knowledge of how to break up 
polysyllabic words into their components be- 
cause such words are to be ignored. Even 
though we admit that the simpler method re- 
quires almost as much knowledge of syllabifi- 
cation as does the more complex method the 
chief advantage of the simpler method lies in 
the fact that it would obviously be much faster. 

But both Klare and Flesch deny that the 
simpler method would be faster. Klare, after 
considering the relationship between one- 
syllable word counts and the number of 
syllables per 100 words merely asserts his 
belief that no significant amount of time would 
be saved. Flesch, using similar reasoning, 


also denies that the simpler method would 
be faster. He even goes so far as to argue 
that about 40 per cent more work would be 
required when the simpler method is used. 
As the great Chicago physiologist, Anton J. 
Carlson, was so fond of saying, “Vass iss de 
evidence?” 

Instead of using armchair reasoning, ar- 
rangements were made in October, 1951 to 
obtain the “evidence.”! The mean time in 
seconds for making the one-syllable word 
counts and looking up the reading ease scores 
in the Farr, Jenkins, and Paterson table (2) 
was 82 with a standard deviation of the dis- 


1 Graduate students in Mr. Paterson’s Seminar in 
Applied Psychology were asked to use the new and the 
old method of computing reading ease scores for 201 
hundred-word samples in 49 house publications using 
an AB, BA design. This work was done under the 
immediate supervision of George W. England. The 
writers are grateful to the following students: Robert 
Becker; Sarah Ruth Cook; Ellen A. Corcoran; Benno 
G. Fricke; Richard S. Hatch; Raymond C. Lee, Jr.; 
Richard C. Maass; Paul W. Maloney; Ernest L. Mc 
Collum; Shanna McGee; Charles Newstrom; and Clair 
A. Peterson. 








J.N. Farr, J. J. Jenkins, D. G. Paterson, and G. W. England 


Table 1 


Comparison of Eleven Examples from How to Test Readability as to Two Syllable Counts and Two Reading 
Ease Scores as Computed by Flesch, by Paterson, and by Mueller 








Number of One-Syllable Words 
per 100 Words Computed by: 





Example R.F.* D.G.P.* J.M.M.* 


80 
81 
77 
77 
72 
71 
72 
69 
70 
71 
61 


s 


as ss OO 


10 
11 


ua yws~ ~ ~I 
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Reading Ease Scores 





Old Formula 


New Formula 





D.G. P. J.M.M. R. F. 


84 91 
85 89 
77 81 
72 80 
65 69 
65 68 
65 66 
61 60 
60 48 
43 47 
36 36 29 


se=u3428 


bm = 
SSagz 





*R. F. refers to Flesch, D. G. P. refers to Paterson, and J. M. M. refers to Mr. Paterson’s secretary—Mrs. 
Joyce Mark Mueller who independently rated these eleven samples and also independently rated the additional 


seven samples in Table 2. 


tribution of 36.8. The mean time for making 
the syllable counts and looking up the reading 
ease scores in the Farr and Jenkins table (1) 
was 147 with an S. D. of 62.8. Thus, the 
evidence fully substantiates the claim that 
the simpler method is obviously faster. 

Klare believes that the simpler method 
would magnify each counting error and thus 
decrease reliability. A thorough-going study 
of the reliability of both methods would be 
needed to settle this issue. Data given below, 
however, would not lead one to put much 
stock in Klare’s belief. 

Flesch attacks our notion that r between the 
old and the new reading ease scores will be 
higher for more heterogeneous materials than for 
the employee handbooks used in developing the 
simpler formula. He gives data for eleven 
samples from his How to Test Readability (4) 
and claims that the new formula is not as 
sensitive as the old because the ceiling for one- 
syllable words seems to be 80 and the floor 
to be about 60. He therefore concludes that 
the new formula underrates both ease and 
difficulty. 

We have recomputed the data for the same 
eleven samples as a check on the reliability 
of the counting procedure and as a check on 
the claim that the new formula introduces a 


systematic bias at the extremes. It will be 
noted that no such bias really exists. Further- 
more, although slight differences in the one-syl- 
lable word counts are shown, no serious errors 
in the new reading ease scores are involved. 
Also one will note that the new reading ease 
scores and the old reading ease scores are 
comparable except for Example 9. This is a 
passage taken from Thorstein Veblen’s Theory 
of the Leisure Class and the discrepancy appears 
to be due to the fact that when Veblen uses a 
polysyllabic word he does so with a vengeance 
jumping to five, six, and even seven syllable 
words. It is probable that the old formula 
gives a better measure of the difficulty of his 
writing than doves the new although a study 
of a large number of samples from all parts of 
his book might not show the same discrepancy. 

Flesch’s claim that the new formula is not 
as sensitive at the extremes of difficulty is not 
borne out by the seven additional samples 
taken from the rest of Flesch’s book (4). The 
data are presented in Table 2. Examples 12 
and 17 show the new formula to be more 
sensitive than the old in the sense of yielding 
a lower reading ease score. Example 16 shows 
the reverse effect—the new formula yields a 
higher reading ease score. But all of tais is 
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Table 2 


Comparison of Seven Addi‘ional Examples from How to Test Readability as to Two Syllable Counts and 
Two Reading Ease Scores as Computed by Paterson and by Mueller 











Number of One-Syllable Words 
per 100 Words Computed by: 


Example D.G. P. J.M.M. 
12 57 
13 78 
14 75 
15 
16 
17 
18 


much ado about nothing since both formulas 
yield substantially equivalent results. 

Flesch’s final criticism is that the new for- 
mula is a step in the wrong direction making 
reading ease formulas even more vulnerable 
to the charge of encouraging “baby talk” and 
“primer style.” We, too, deplore this type 
of charge because it is unfair but we believe 
that no larger proportion of the “literary 
stylists” will attack the new formula than 
have been attacking the old formula on these 
grounds. Flesch, himself, has given effective 
answers to this “baby talk” type of attack in 
his How to Test Readability (4, pp. 40, 41, 45, 48, 
49, and 50). 

In conclusion, we would stress “time saving” 
as the great virtue of the new formula. It is 
to be hoped that this “time saving’ virtue 
will lead to a far greater utilization of Flesch’s 
important contribution in a greater variety 
of situations than is now the case. As matters 
stand, there is reason to believe that many 


Reading Ease Scores 


New Formula 


Old Formula 


D.G. P. j.M.M. R.F. 


12 1: 31 
81 83 82 
73 d 72 
22 
85 
33 
82 


practical people think that it takes an expert 
to make readability studies. The purpose 
of the new formula with the table for facilitat- 
ing the computation of reading ease scores 
is to persuade practical men to use it in their 
daily work. 
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Correction 


Balinsky, B., Blum, M. L., and Dutka, S. 
The coefficient of agreement in determining 
product preferences. J. of Appl. Psychol., 
1951, 35, 348-351 (Oct.). 

Between the time that galley proofs were 
approved by the authors and the publication of 
the above article a number of unfortunate errors 
appeared. Correction of these errors is im- 
portant if proper application of the formula 
I m 
> oF > appears in 


a es 
cant 1 n m 
the article it should be ( ) or (") 
tically, these symbols have very different 
meaning: 
The incorrect substitution appears as fol- 
lows: (1) Page 348, column 2, last line; (2) Page 


is to be made. Whenever 


Statis- 


349, column 1, line 12; (3) Page 349, column 1, 
line 14; (4) In formula on Page 349, column 1; 
(5) In formula on Page 349, column 2; (6) In 
formula Page 350, column 2; it must also be 
noted that a printer’s error occurred since the 
n in the formula was inverted; and (7) Foot- 
note on Page 350, column 2. 

The computation, following the formula on 
page 350, column 1, was incorrectly inserted 
by the authors. The number .0473 should be 
substituted for .095; 35.1 should be substi- 
tuted for 63.7; and .01 should be substituted : 
for .001. Fortunately the correction of these 
values in no way changes the conclusions. 
Received December 29, 1951. 

Published out of turn by the editor. 
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Welford, A. T. Skill and age: an experimental 
approach. London: Oxford Univ. Press, 
1951. Pp. 161. $1.75. 

This is a research report issued by the 
Nuffield Foundation at Cambridge covering 
work done there during the years 1946 to 1948 
by the Research Unit into Problems of Aging. 
It is concerned with the study of performance 
or skill changes that come with increasing age. 

Following a short introduction, the first 
part of the book is devoted to a theoretical 
discussion of these performance changes. 
Theoretical explanations are discussed under 
the four broad categories of bodily changes, 
features of the enviroment, methods of dealing 
with situations, and anticipatory adjustments. 
A discussion of the “mechanisms of skilled 
activity within an individual” follows. Here 
the author makes a convincing case for the 
use of a new research design for studying per- 
formance. This design emphasizes the im- 
portance of accurate measurement of com- 
ponents of total performance so that methods 
used in achieving this performance as well as 
end results can be determined. 

After a brief statement on sampling diffi- 
culties, five laboratory experiments dealing 
with manipulatory skills are reported. Sub- 
jects ranged in age from 18 to 82. Each of the 
experiments illustrates the utility of the ex- 
perimental design proposed in the book. The 
findings point to a compensatory change of 
performance with age in which deterioration 
in one aspect of total performance tends to be 
somewhat offset by improvement in another. 

Four laboratory experiments on non-motor 
performance (classified by the author as in- 
volving ‘‘mental skills”) make up the next 
section. A general falling off of these non- 
motor skills with age is reported. 

The final section on experimental results 
deals with studies made in the industrial 
situation, involving a total of 3,211 employees 
of 24 concerns. A modification of the same 
research design as that used in the laboratory 
was applied in the industrial setting. Total 
job performance is broken down into oper- 
ations, and age distributions for jobs with 
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and without certain operations are compared. 
Two operations studied were found to be 
associated with the age of the workers. Older 
workers are found more often on jobs in which 
there is no time-stress or pressure for speed, 
and they are more likely to be found in small 
rather than large work groups. 

There is an appendix showing statistical 
significance for all of the findings, and a 23- 
item bibliography. 

As the author points out, general conclusions 
are not justified because of sampling limit- 
ations and the relatively small numbers of 
subjects involved in most of the experiments. 
However, this does not seriously detract from 
the main value of the report which lies in the 
research technique for studying performance. 
The experiments clearly demonstrate the 
usefulness of the technique. The findings 
reported and the discussion of these findings 
suggest a number of hypotheses for other 
researchers in this area. 

Because experimental results are supple- 
mented with stimulating theoretical specula- 
tion into the “‘why’s” behind the findings, 
the book makes highly interesting reading. 
It is a “must” for other research workers in 
this field and is worth while reading for any 
experimentalist in the field of human behavior. 
Appearing at a time when we in America are 
becoming more and more aware of the increas- 
ing proportion of older people in our popula- 
tion and the problems that go along with this 
condition, this book should be received as a 
welcome addition to the relatively sparse 
research literature in this important area of 
applied psychology. 

Theodore R. 

Prudential Insurance Co., 

Newark, N. J. 


Lindbom 


Dooher, M. Joseph, and Marquis, Vivienne 


(eds.). Rating employee and supervisory 

performance. New York: American Manage- 

ment Association, 1950. Pp. 192. $3.75 

(paper bound). $4.00 (cloth bound). 

This manual of merit-rating techniques 
brings together some of the best merit rating 
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material published by the American Manage- 
ment Association during past years. Some 
new material is included. The volume is 
comprised. of seventeen chapters (including 
exhibits and appendix) by fourteen contrib- 
utors. Included are sections on basic prin- 
ciples and techniques, scientific approach 
toward rating, special adaptations, company 
case histories, applying rating results, and the 
rating form. 

The quality of contributions is uneven, but 
less so than is usually true of books of this 
type. The editors have limited themselves 
to articles prepared for A.M.A. publications, 
but within this area have used excellent dis- 
crimination. True, overlap and_ repetition 
are frequent; but fortunately arise in contexts 
which give the impression of intentional (and 
deserved) emphasis. 

Included are some classic studies (e.g., 
Driver’s “‘Case history in merit rating,” first 
published in 1940) which well warrant re- 
publication to be more readily available to 
current readers. Unfortunately this book 
does not cite source or date of original publica- 
tion. This is apt to be a disadvantage to 
compilers and users of bibliographies. 

Most of the articles are written by psycho- 
logists for non-psychologists, and are sound 
both theoretically and practically. Compiled 
primarily for executives, supervisors, and 
personnel technicians, this book will be valu- 
able to all workers in the field of merit rating 
whether professional psychologists or other- 
wise. In fact, no such worker should be 
without a copy. The cloth bound volume 
would be recommended over the paper bound 
volume even if the difference in cost were 
considerably greater than it is. 


C. E. Jurgensen 


Minneapolis Gas Company 


Spearman, C., and Jones, L. W. Human 
ability. New York: The Macmillan Co., 
1950. Pp. 198. $2.50. 

Factor analysis began with Spearman, and 
it is fitting that he should be honoured for this. 
There is all the more reason for regretting that 
this last book of his was written. The exact 
contribution of Wynn Jones is not mentioned, 


but in the preface Wynn Jones states, “. . . 
when he (Spearman) proposed that the book 
should appear under our joint names, I had to 
point out that my share in it did not deserve 
such distinction.” It seems, therefore, that 
Spearman was responsible for the greater part 
of the work. The book is sub-titled, ‘“A Con- 
tinuation of the Abilities of Man,” and it is 
just this which makes the criticism so adverse. 
In fact Human ability is little more than a 
restatement of the earlier work. Even the 
same quotations reappear. 

It is true that there is a fuller discussion of 
group factors, and it is in some ways depress- 
ing to realize that Spearman was very near 
the truth when he said that group factors 
‘*, . have been small or rare,’’ for this makes 
vocational guidance, if not selection, a much 
more difficult problem. Again, Spearman 
deserves credit for attempting to give his 
factors more psychological meaning than do 
many others. He is not merely content to 
play with numbers; but this we knew before. 


Douglas Irvine 


National Institute of Industrial Psychology, 
London, England 


Panel on Psychology and Physiology of the 

Committee on Undersea Warfare of the Na- 
tional Research Council. A survey report 
on human factors in undersea warfare. 
Washington, D. C.: National Research 
Council, 1949. Pp. 541. $2.25. 


“Should another war come, victory may 
well be, not on the side of the strongest battal- 
lions, not even on the side of the best guided 
missiles, but on the side which has gained a 
vital 10 per cent in the successful handling 


of human factor problems.” This thought has 
apparently guided the metamorphosis of a 
report on research needs for a part of the 
National Military Establishment into what 
amounts to a survey of the field of applied 
experimental psychology. While there is al- 
ways reference to the psychological aspects 
of submarine operation in particular, the 
material surveyed is relevant to a wide range 
of industrial and military problems of human 
efficiency. 

With some 30 contributors producing 23 
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data-packed chapters on as many special prob- 
lems, a cohesive review is difficult. However, 
D. R. Craig and D. G. Ellson, in their discus- 
sion of the design of controls, present a concept 
around which the far-flung content may be 
organized. This is the concept of the operator 
as a bio-mechanical link in a control system. 
If the reader holds this concept in mind, the 
diverse topics can be seen as special aspects 
of the input and output problems of such a 
link. 

Under input factors, the whole topic of 
sensation comes to the fore as a useful tool 
for gauging the optimal and limiting condi- 
tions for the reception of information. The 
survey devotes several chapters to the basic 
and applied principles of vision and audition 
in the performance of military tasks. Design 
of dials, complex visual displays, discrimin- 
ations in radar and sonar, are but a few of the 
special topics discussed. 

As a link in a control system, the individual 
responds to the information he receives with 
appropriate activity. Analysis of this ac- 
tivity, conceived as output, leads to the study 
of motor skills. These are reviewed here in re- 


lation to movement accuracy, control panel 


design, and the arrangement of working areas. 

The correlation between input and output 
may be influenced by a large number of factors 
which may be designated as operator variables. 
Surveys of performance as affected by condi- 
tions of habitability, emotional stress, and as 
a result of selection and training fall in this 
category. Inclusion of these topics removes 
the possibility of construing the concept of the 
operator as an instance of mechanomorphism. 
Morale, leadership, mental health, childhood 
training, etc., are all considered in the light 
of their influence upon the efficiency of oper- 
ation. Thus, the operator is not viewed as a 
piece of machinery, even though his contri- 
bution to a man-machine system is the central 
consideration. 

For the applied fields of industrial and 
military psychology, or for any field in which 
the efficient utilization of manpower is a prob- 
lem, this survey provides a factual reference 
and a guide to needed research that is far more 
valuable than its title implies. The approach 
to behavior here presented may, indeed, prove 


to be as crucial for our times as is suggested 
in this review’s opening quotation. 


Wallace A. Russell 


University of Minnesota 


Berrien, F. K., Comments and cases on human 
relations. New York: Harper and Brothers, 
1951. Pp. xi+ 500. $4.50. 

This volume is another of those inspired by 
the case study techniques of the Harvard 
Graduate School of Business Administration. 
As the author says, “This book is the out- 
growth of a somewhat accidental but extremely 
stimulating experience in the fall and early 
winter of 1945, when I had the privilege of 
watching for one semester the initial instruc- 
tion in Human Relations at Harvard College.”’ 
The approach in this book is that the psychol- 
ogy of human behavior can be taught in the 
form of “human relations.” The author 
indicates the objective of the book as, “I 
think of my audience as being composed of 
college students and members of adult edyca- 
tion classes—not teachers or experts. At 'the 
same time I have tried to ‘keep the cookies on 
the middle shelves’—high enough to require 
some stretching but still within reach. This 
is no attempt to build a system or a grand 
theory of human relations. I have tried, how- 
ever, to develop a picture of human relations 
problems in a self-consistent matter around 
the theme of a need for social harmony and 
self-actualization.”’ 

The book is divided into two parts. Part, I 
consists of a number of chapters based around 
various human relations concepts. This de- 
velopment constitutes the first 246 pages. 
Part II consists of a series of case studies— 
some 28 in number. Included at the end of the 
book is a very short Instructor’s Appendix, 
the purpose of which the author says, “The 
Human Relations course, for which these 
comments and cases are designed, presupposes 
a method of teaching and a set of objectives 
which merit discussion.’’ He further states, 
“This is not a ‘buttoned-up,’ packaged course. 
There is room for a great deal of individual 
initiative and variation on the part of instruc- 
tors to find more effective means of developing 
responsibility in students for their own educa- 
tional growth.” 
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There can be no quarrel with the purpose 
or intent of the author in devising this volume. 
However, a book of this kind, which is designed 
for adult educational groups, probably would 
have been better written if it had been more 
simply written. For example, the author 
says on page 46, “So far in this chapter we 
have pointed out the subjectiveness of ob- 
servations and the importance of perceiving 
differences among similarities. The task of 
synthesizing our many discrete observations 
into some kind of order remains before us.” 

After spending two chapters in developing 
the need and the setting for the problem of 
cooperative behavior, the author proceeds 
with a rather complete discussion of the prob- 
lem of words. Here again, it seems to this 
reviewer that the author falls into the trap 
of using extremely complicated words and 
phrases to explain human relations problems. 
For example, on page 26, he says, “We have 
at least two kinds of words, or, more precisely, 
words may have two kinds of meanings. The 
semanticists refer to these two kinds of mean- 
ings as extensional and intensional. The 
extensional meaning of a word is the object 
or event in the objective world which the word 
denotes.”” Here again it is difficult to con- 
ceive of a group of adults, unless they are 
familiar with the background of semantics, 
being able to understand, without a great 
deal of assistance, what the meanings of these 
words are. If the study group in “human 
relations” is going to spend a good portion 
of its time in developing a theory of “human 
relations” and “human behavior” prior to 
the time they are going to use the case studies, 
then this book in itself is incomplete. 

The book is well annotated, and reference 
is constantly made to the most recent liter- 
ature. 

Howard P. 


Minneapolis-Honeywell Regulator Company 
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Gouldner, Alvin W. (ed.). Studies in leader- 
ship. Leadership and democratic action. 
New York: Harper and Brothers, 1950. 
Pp. xvi + 736. $5.00. 


Dr. Gouldner has assembled 31 papers which 
he groups and presents in five parts as follows: 
types of leaders; leadership and its group 


settings; authoritarian and democratic leaders; 
ethics and technics of leadership; and affirma- 
tions and resolutions. At the beginning of 
each part, the author has written a section 
which is helpful in orienting the reader in the 
group of papers which follows, and the author’s 
introduction is an excellent discussion of some 
of the theory and problems involved in the 
study of leadership, particularly the contro- 
versy between the trait and situationist ap- 
proaches. 

Most of the papers were written by sociol- 
ogists, many of them being unpublished pre- 
viously. However, although this type of 
volume is needed and could prove very helpful 
to psychologists, the coverage is such that 
much of its possible value is lost. While the 
book carefully points out the mass of literature 
which exists on leadership, the material pre- 
sented offers little in the way of specific 
problems, specific methodologies, or specific 
theories supported by experimental evidence. 
Therefore, although the mass of literature 
again is increased considerably, the study of 
leadership generally is neglected. The paper 
by Eaton, “Is scientific leadership selection 
possible?” constitutes the single major ex- 
ception to this, although he limits his discussion 
too specifically to military studies and certain 
sociological approaches, and neglects such 
important considerations as criteria for leader- 
ship and an evaluation of the devices discussed. 
The recent work of Gardner, Carter, and 
Henry, the long-term research of the Ohio 
State Leadership Studigs, the review of leader- 
ship trait literature hy Stogdill, as well as 
other recent experimental contributions by 
psychologists and sociologists, receive no at- 
tention by Eaton or anywhere in the volume. 

It might appear that Dr. Gouldner selected 
papers for their value-in contributing to and 
propounding certain social theories and/or 
social reforms and that leadership phenomena 
are considered and evaluated in a particular 
social frame of reference. There is, then, a 
general assumption that democratic leadership 
as these writers discuss it is the leadership to- 
ward which all should be striving, even though 
there is little attempt to structure the “demo- 
cratic action” which is the sub-title of the 
book. Perhaps the following statement from 
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Gouldner’s summary remarks on all of the 
papers characterizes the bias of the approach: 
“In terms of social policy, there is a tendency 
to assume that the era of ‘free competition’ 
and ‘laissez faire’ is well behind us, that some 
form of planning is inevitable, but that this 
involves real dangers to democratic liberties 
for which new safeguards have to be invented, 
that the ‘nationalization’ of industry is not 
identical with its ‘socialization.’’’ The ap- 
proach is illustrated further in the subject 
index which lists 51 references to “apathy,” 
but no general references to “criteria”? or 
“methods”; 88 references to leadership among 
various minority groups and trade unions, 
but no general references to business, in- 
dustrial, religious, or educational leadership. 

One of the valid criticisms of many leader- 


ship studies in the past has been the indefinite- 
ness and vagueness which characterized them. 
On the other hand, much recent work has been 
characterized by the attempt to structure, 
to formalize, to coordinate and organize some 
of the theory of leadership and the methods 
for experimentation and study. Gouldner’s 
readings make no major contribution to this 
recent work, and the value of the book is thus 
proportionately decreased. However, much 
of the reading is interesting, including the 
emphasis which is placed on the thinking of 
Freud, Weber, Marx, Durkheim, and Mann- 
heim as it relates to leadership and social 
action, as these writers see it. 


C. G. Browne 
Wayne University 
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